<feed xmlns='http://www.w3.org/2005/Atom'>
<title>faeval.git/protocol/examples/temporal_diagnostic_evolution.py, branch master</title>
<subtitle>Unnamed repository; edit this file 'description' to name the repository.
</subtitle>
<link rel='alternate' type='text/html' href='https://git.blackhao.com/faeval.git/'/>
<entry>
<title>Cross-architecture temporal validation: 3 archs x 3 seeds x 2 methods</title>
<updated>2026-04-08T04:03:05+00:00</updated>
<author>
<name>YurenHao0426</name>
<email>Blackhao0426@gmail.com</email>
</author>
<published>2026-04-08T04:03:05+00:00</published>
<link rel='alternate' type='text/html' href='https://git.blackhao.com/faeval.git/commit/?id=4172195ca318387e20e3576ab40187d4d2f08ebe'/>
<id>4172195ca318387e20e3576ab40187d4d2f08ebe</id>
<content type='text'>
ResMLP (4-block d=256, with out_ln, CIFAR-10):
  s42:  DFA (a) ep 8,  (b) ep 4,  acc 0.308
  s123: DFA (a) ep 11, (b) ep 4,  acc 0.320
  s456: DFA (a) ep 8,  (b) ep 3,  acc 0.300

ViT-Mini (4-block d=128, cls token + terminal LN, CIFAR-10):
  s42:  DFA (a) ep 1,  (b) ep 3,  acc 0.256
  s123: DFA (a) ep 1,  (b) ep 2,  acc 0.202
  s456: DFA (a) ep 1,  (b) ep 3,  acc 0.253

StudentNet (4-block d=128, NO terminal LN, synthetic alpha=1.0):
  s42:  DFA (a) ep 18, (b) NEVER, acc 0.332
  s123: DFA (a) ep 14, (b) NEVER, acc 0.314
  s456: DFA (a) ep 25, (b) NEVER, acc 0.336

BP: never fires on any seed x any architecture (9/9 sanity passes).

Key cross-architecture finding: diagnostic (b) is specifically the LN-
driven failure mode. Without out_ln, the BP grad never crosses the 1e-7
floor, even though (a) still fires (the residual stream still grows, just
without the LN-cancellation pathology that drives the BP grad to the
floor). This is the causal architectural control: (b) specifically tests
'is terminal-LN gradient cancellation active?' and (a) tests 'is the
residual stream growing without bound?'. They are linked but separable.

This is the §3 cross-architecture validation evidence.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
ResMLP (4-block d=256, with out_ln, CIFAR-10):
  s42:  DFA (a) ep 8,  (b) ep 4,  acc 0.308
  s123: DFA (a) ep 11, (b) ep 4,  acc 0.320
  s456: DFA (a) ep 8,  (b) ep 3,  acc 0.300

ViT-Mini (4-block d=128, cls token + terminal LN, CIFAR-10):
  s42:  DFA (a) ep 1,  (b) ep 3,  acc 0.256
  s123: DFA (a) ep 1,  (b) ep 2,  acc 0.202
  s456: DFA (a) ep 1,  (b) ep 3,  acc 0.253

StudentNet (4-block d=128, NO terminal LN, synthetic alpha=1.0):
  s42:  DFA (a) ep 18, (b) NEVER, acc 0.332
  s123: DFA (a) ep 14, (b) NEVER, acc 0.314
  s456: DFA (a) ep 25, (b) NEVER, acc 0.336

BP: never fires on any seed x any architecture (9/9 sanity passes).

Key cross-architecture finding: diagnostic (b) is specifically the LN-
driven failure mode. Without out_ln, the BP grad never crosses the 1e-7
floor, even though (a) still fires (the residual stream still grows, just
without the LN-cancellation pathology that drives the BP grad to the
floor). This is the causal architectural control: (b) specifically tests
'is terminal-LN gradient cancellation active?' and (a) tests 'is the
residual stream growing without bound?'. They are linked but separable.

This is the §3 cross-architecture validation evidence.
</pre>
</div>
</content>
</entry>
<entry>
<title>Protocol diagnostic (a): use max per-block growth, not max/min ratio</title>
<updated>2026-04-08T04:00:54+00:00</updated>
<author>
<name>YurenHao0426</name>
<email>Blackhao0426@gmail.com</email>
</author>
<published>2026-04-08T04:00:54+00:00</published>
<link rel='alternate' type='text/html' href='https://git.blackhao.com/faeval.git/commit/?id=31ddecc9eb646b15c4ac5960c7de9346c8f7be68'/>
<id>31ddecc9eb646b15c4ac5960c7de9346c8f7be68</id>
<content type='text'>
Old metric: max(||h||) / max(||h_0||, eps). False-positives on ViT-style
architectures because the cls token at layer 0 (right after patch_embed)
has anomalously small magnitude (~0.3-1.5), inflating the ratio even on
healthy BP-trained ViTs.

New metric: max_l(||h_{l+1}|| / ||h_l||) — the largest single-block
residual amplification. Architecture-invariant.

Calibration:
  - BP-trained, late training: &lt;5x per block
  - BP ViT, early epochs (cls token resolving): 13-25x max
  - DFA-trained ResMLP/ViT: 100-4000x per block
Threshold raised from 10 to 50 to sit cleanly between healthy-early-
training (max 25) and failure-regime (min 100).

Re-verifications:
  - smoke test (BP/DFA/EP): all 3 verdicts unchanged
  - random init (3 seeds): trustworthy on all 3
  - 5-method audit table single-seed: identical verdicts
  - decision-utility ablation: identical (still 0/5 by S1, 3/5 by S_full)
  - temporal evolution 3-seed: (b) now fires first at ep 3-4, (a) at ep
    8-11. Both well before training ends. The 'protocol fires ~92 epochs
    early' story still holds.
  - ViT temporal evolution: BP no longer false-fires; DFA fires (a) ep 1,
    (b) ep 3 — protocol works on the second architecture.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Old metric: max(||h||) / max(||h_0||, eps). False-positives on ViT-style
architectures because the cls token at layer 0 (right after patch_embed)
has anomalously small magnitude (~0.3-1.5), inflating the ratio even on
healthy BP-trained ViTs.

New metric: max_l(||h_{l+1}|| / ||h_l||) — the largest single-block
residual amplification. Architecture-invariant.

Calibration:
  - BP-trained, late training: &lt;5x per block
  - BP ViT, early epochs (cls token resolving): 13-25x max
  - DFA-trained ResMLP/ViT: 100-4000x per block
Threshold raised from 10 to 50 to sit cleanly between healthy-early-
training (max 25) and failure-regime (min 100).

Re-verifications:
  - smoke test (BP/DFA/EP): all 3 verdicts unchanged
  - random init (3 seeds): trustworthy on all 3
  - 5-method audit table single-seed: identical verdicts
  - decision-utility ablation: identical (still 0/5 by S1, 3/5 by S_full)
  - temporal evolution 3-seed: (b) now fires first at ep 3-4, (a) at ep
    8-11. Both well before training ends. The 'protocol fires ~92 epochs
    early' story still holds.
  - ViT temporal evolution: BP no longer false-fires; DFA fires (a) ep 1,
    (b) ep 3 — protocol works on the second architecture.
</pre>
</div>
</content>
</entry>
<entry>
<title>Temporal evolution 3-seed: protocol fires at DFA epoch 3-4 on all seeds</title>
<updated>2026-04-08T03:51:27+00:00</updated>
<author>
<name>YurenHao0426</name>
<email>Blackhao0426@gmail.com</email>
</author>
<published>2026-04-08T03:51:27+00:00</published>
<link rel='alternate' type='text/html' href='https://git.blackhao.com/faeval.git/commit/?id=4420af372024ef12b28eac21678504dd75484dca'/>
<id>4420af372024ef12b28eac21678504dd75484dca</id>
<content type='text'>
  s42: (a)+(b) fire at epoch 4, DFA final acc 0.3076
  s123: (a)+(b) fire at epoch 4, DFA final acc 0.3203
  s456: (a)+(b) fire at epoch 3, DFA final acc 0.2998

BP never fires on any seed (final acc 0.61-0.63).

The 'protocol catches it 96 epochs early' finding is fully reproducible
across seeds.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
  s42: (a)+(b) fire at epoch 4, DFA final acc 0.3076
  s123: (a)+(b) fire at epoch 4, DFA final acc 0.3203
  s456: (a)+(b) fire at epoch 3, DFA final acc 0.2998

BP never fires on any seed (final acc 0.61-0.63).

The 'protocol catches it 96 epochs early' finding is fully reproducible
across seeds.
</pre>
</div>
</content>
</entry>
<entry>
<title>Add temporal diagnostic evolution: protocol fires at epoch 4 of DFA</title>
<updated>2026-04-08T03:49:53+00:00</updated>
<author>
<name>YurenHao0426</name>
<email>Blackhao0426@gmail.com</email>
</author>
<published>2026-04-08T03:49:53+00:00</published>
<link rel='alternate' type='text/html' href='https://git.blackhao.com/faeval.git/commit/?id=a89ef4dee2750dd7bddbe1fd0a1b94d1f74d6f9c'/>
<id>a89ef4dee2750dd7bddbe1fd0a1b94d1f74d6f9c</id>
<content type='text'>
Replays per-epoch logged data from results/snapshot_evolution_v2/ through
the protocol thresholds.

Result: diagnostics (a) ||h_l|| explosion AND (b) ||g_L|| at floor BOTH
first fire at epoch 4 of DFA training. At that point, DFA test acc is
0.308 — its final value at epoch 100 is also 0.308. The protocol could
have walked back the headline 96 epochs before training finished.

DFA's gamma hovers at 0.087-0.107 for all 100 epochs. A reviewer looking
at acc+gamma would conclude 'DFA is hovering at 31% acc with ~0.10
alignment, both reasonable'. Wrong on both counts.

BP never fires any diagnostic at any epoch. Stays bounded at ||h_L||~200,
||g_L||~3-5e-5, accuracy climbs to 0.61.

This is the temporal validation of decision utility: the protocol catches
the pathology AS IT HAPPENS, not just retrospectively.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Replays per-epoch logged data from results/snapshot_evolution_v2/ through
the protocol thresholds.

Result: diagnostics (a) ||h_l|| explosion AND (b) ||g_L|| at floor BOTH
first fire at epoch 4 of DFA training. At that point, DFA test acc is
0.308 — its final value at epoch 100 is also 0.308. The protocol could
have walked back the headline 96 epochs before training finished.

DFA's gamma hovers at 0.087-0.107 for all 100 epochs. A reviewer looking
at acc+gamma would conclude 'DFA is hovering at 31% acc with ~0.10
alignment, both reasonable'. Wrong on both counts.

BP never fires any diagnostic at any epoch. Stays bounded at ||h_L||~200,
||g_L||~3-5e-5, accuracy climbs to 0.61.

This is the temporal validation of decision utility: the protocol catches
the pathology AS IT HAPPENS, not just retrospectively.
</pre>
</div>
</content>
</entry>
</feed>
