From 0c1d102c57d86d914eb1122dd59f329667db60d8 Mon Sep 17 00:00:00 2001 From: YurenHao0426 Date: Wed, 8 Apr 2026 18:32:23 -0500 Subject: =?UTF-8?q?paper=20v2.31.9:=20relabel=20"StudentNet"=20=E2=86=92?= =?UTF-8?q?=20"no-terminal-LN=20ResMLP"?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The §3 ¶3 / §5 ¶3 / Figure 5 / §7 mentions of "StudentNet" as a cross-architecture validation case were a misleading rebrand of the no-terminal-LN ResMLP-d256 ablation. Verified by tracing the data: results/protocol_audit/temporal_evolution_s{42,123,456}.json final_acc 0.332/0.313/0.336 (matches no-outln 3-seed 0.327±0.012) first_fire_a {18, 14, 25} first_fire_b None / None / None The actual synth StudentNet (results/snapshot_synth_v1, d=128 alpha=1.0) has max-per-block growth ~6.88 over 80 epochs and never reaches the 50× threshold, so diagnostic (a) does NOT fire on the real synth StudentNet at all. Calling the no-outln data "StudentNet" doubled- counted the same architecture under two names (the same-backbone causal control AND the cross-arch generalization test). Relabeled to "no-terminal-LN ResMLP" everywhere it appeared: - §3 ¶3 paragraph 1 cross-arch list - §3 ¶3 paragraph 2 (now with explicit per-seed first-fire epochs {18,14,25}) - §5 paragraph (the conclusion) - §7 conclusion (cross-arch list) - Figure 5 caption - Figure 5 row label (with re-rendered PDF) The remaining cross-arch generalization claim is now: ViT-Mini fires both diagnostics, ResMLP at d=256/d=512 fires both, no-terminal-LN ResMLP and BatchNorm CNN fire only (a) — three real architecture classes, with the no-LN ablation being the same-backbone control rather than a separate architecture. The cross-arch story is slightly weaker ("3 architecture classes" not "4") but truthful and self-consistent. Co-Authored-By: Claude Opus 4.6 (1M context) --- paper/figures/render_fig5_cross_arch.py | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) (limited to 'paper/figures/render_fig5_cross_arch.py') diff --git a/paper/figures/render_fig5_cross_arch.py b/paper/figures/render_fig5_cross_arch.py index 9ad9ce2..9d52e09 100644 --- a/paper/figures/render_fig5_cross_arch.py +++ b/paper/figures/render_fig5_cross_arch.py @@ -10,8 +10,8 @@ REPO_ROOT = "/home/yurenh2/fa" # Verdict matrix: arch x diagnostic # 0 = ok (BP), 1 = ok-non-LN-arch, 2 = walk-back # Columns: (a) per-block growth, (b) ||g_L|| floor, (c) drift stability, (d) frozen baseline -# Rows: ResMLP-d256, ResMLP-d512, ViT-Mini, StudentNet (no LN), CNN (BN, no LN) -arches = ["ResMLP $d{=}256$\n(terminal LN)", "ResMLP $d{=}512$\n(terminal LN)", "ViT-Mini\n(cls + LN)", "StudentNet\n(no terminal LN)", "CNN BatchNorm\n(no terminal LN)"] +# Rows: ResMLP-d256, ResMLP-d512, ViT-Mini, no-terminal-LN ResMLP-d256, CNN (BN, no LN) +arches = ["ResMLP $d{=}256$\n(terminal LN)", "ResMLP $d{=}512$\n(terminal LN)", "ViT-Mini\n(cls + LN)", "ResMLP $d{=}256$\n(no terminal LN)", "CNN BatchNorm\n(no terminal LN)"] diags = ["(a) scale", "(b) ${\\|g\\|}$ floor", "(c) drift", "(d) frozen"] # DFA verdicts on each @@ -20,7 +20,7 @@ dfa = np.array([ [1, 1, 0, 1], # ResMLP d256: (a) fires, (b) fires, (c) noise sub-mode, (d) fires [1, 1, 0, 1], # ResMLP d512: same pattern [1, 1, 0, 1], # ViT-Mini: same pattern - [1, 0, 0, 0], # StudentNet: only (a) fires; (b) NEVER + [1, 0, 0, 0], # ResMLP no-LN: only (a) fires; (b) NEVER [1, 0, 0, 0], # CNN BN: only (a) fires; (b) NEVER (the killer (b)-is-LN-specific finding) ]) -- cgit v1.2.3