|
4-panel layout (one per diagnostic), 5 methods sorted bottom-to-top by
ascending accuracy, color-coded healthy (BP/EP, blue) vs degenerate
(DFA/SB/CB, red), with threshold lines drawn:
(a) max per-block growth (log scale, threshold 50x)
(b) ||g_L|| (log scale, floor 1e-7)
(c) cross-batch stability (linear, ceiling 0.30)
(d) headline acc (linear, frozen baseline 0.349)
The visual layout makes it immediately obvious that:
- (a) and (b) cleanly split healthy from degenerate (4-7 OOM gap)
- (c) is bimodal and doesn't cleanly split — confirms it's a sub-mode
discriminator, not a primary detector
- (d) shows BP above the frozen baseline by ~25 pp while DFA/CB/SB
are at or below it
|