faeval.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author
2026-04-07	Protocol diagnostic (a): use max per-block growth, not max/min ratio	YurenHao0426
	Old metric: max(\|\|h\|\|) / max(\|\|h_0\|\|, eps). False-positives on ViT-style architectures because the cls token at layer 0 (right after patch_embed) has anomalously small magnitude (~0.3-1.5), inflating the ratio even on healthy BP-trained ViTs. New metric: max_l(\|\|h_{l+1}\|\| / \|\|h_l\|\|) — the largest single-block residual amplification. Architecture-invariant. Calibration: - BP-trained, late training: <5x per block - BP ViT, early epochs (cls token resolving): 13-25x max - DFA-trained ResMLP/ViT: 100-4000x per block Threshold raised from 10 to 50 to sit cleanly between healthy-early- training (max 25) and failure-regime (min 100). Re-verifications: - smoke test (BP/DFA/EP): all 3 verdicts unchanged - random init (3 seeds): trustworthy on all 3 - 5-method audit table single-seed: identical verdicts - decision-utility ablation: identical (still 0/5 by S1, 3/5 by S_full) - temporal evolution 3-seed: (b) now fires first at ep 3-4, (a) at ep 8-11. Both well before training ends. The 'protocol fires ~92 epochs early' story still holds. - ViT temporal evolution: BP no longer false-fires; DFA fires (a) ep 1, (b) ep 3 — protocol works on the second architecture.
2026-04-07	Add protocol decision-utility ablation table	YurenHao0426
	Builds on the 5-method audit JSON. For each method, evaluates 7 reporting strategies (S0=acc only, S1=+Γ field standard, S2-S5=+single diagnostic, S_full=full protocol), and emits the verdict each strategy would have reached. Result: 3 of 5 methods (DFA/SB/CB) are walked back by S_full but NOT by S1. Each of (a)scale, (b)floor, (d)frozen is independently sufficient for binary detection of those 3 failures. Diagnostic (c)stability adds sub-mode discrimination (drift vs noise) but not new positive detections. This is the §3 protocol decision-utility evidence.