diff options
| author | YurenHao0426 <Blackhao0426@gmail.com> | 2026-04-08 18:32:23 -0500 |
|---|---|---|
| committer | YurenHao0426 <Blackhao0426@gmail.com> | 2026-04-08 18:32:23 -0500 |
| commit | 0c1d102c57d86d914eb1122dd59f329667db60d8 (patch) | |
| tree | 04d8e676817fa9f243d466686efdfcb2883bff99 /paper/figures | |
| parent | 2b4581723d0c5ed562528fac6b0a789adf95e3c5 (diff) | |
paper v2.31.9: relabel "StudentNet" → "no-terminal-LN ResMLP"
The §3 ¶3 / §5 ¶3 / Figure 5 / §7 mentions of "StudentNet" as a
cross-architecture validation case were a misleading rebrand of the
no-terminal-LN ResMLP-d256 ablation. Verified by tracing the data:
results/protocol_audit/temporal_evolution_s{42,123,456}.json
final_acc 0.332/0.313/0.336 (matches no-outln 3-seed 0.327±0.012)
first_fire_a {18, 14, 25}
first_fire_b None / None / None
The actual synth StudentNet (results/snapshot_synth_v1, d=128 alpha=1.0)
has max-per-block growth ~6.88 over 80 epochs and never reaches the
50× threshold, so diagnostic (a) does NOT fire on the real synth
StudentNet at all. Calling the no-outln data "StudentNet" doubled-
counted the same architecture under two names (the same-backbone
causal control AND the cross-arch generalization test).
Relabeled to "no-terminal-LN ResMLP" everywhere it appeared:
- §3 ¶3 paragraph 1 cross-arch list
- §3 ¶3 paragraph 2 (now with explicit per-seed first-fire epochs {18,14,25})
- §5 paragraph (the conclusion)
- §7 conclusion (cross-arch list)
- Figure 5 caption
- Figure 5 row label (with re-rendered PDF)
The remaining cross-arch generalization claim is now: ViT-Mini fires
both diagnostics, ResMLP at d=256/d=512 fires both, no-terminal-LN
ResMLP and BatchNorm CNN fire only (a) — three real architecture
classes, with the no-LN ablation being the same-backbone control rather
than a separate architecture. The cross-arch story is slightly weaker
("3 architecture classes" not "4") but truthful and self-consistent.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Diffstat (limited to 'paper/figures')
| -rw-r--r-- | paper/figures/fig5_cross_arch_summary.pdf | bin | 32029 -> 31577 bytes | |||
| -rw-r--r-- | paper/figures/render_fig5_cross_arch.py | 6 |
2 files changed, 3 insertions, 3 deletions
diff --git a/paper/figures/fig5_cross_arch_summary.pdf b/paper/figures/fig5_cross_arch_summary.pdf Binary files differindex 00267a7..efab2b8 100644 --- a/paper/figures/fig5_cross_arch_summary.pdf +++ b/paper/figures/fig5_cross_arch_summary.pdf diff --git a/paper/figures/render_fig5_cross_arch.py b/paper/figures/render_fig5_cross_arch.py index 9ad9ce2..9d52e09 100644 --- a/paper/figures/render_fig5_cross_arch.py +++ b/paper/figures/render_fig5_cross_arch.py @@ -10,8 +10,8 @@ REPO_ROOT = "/home/yurenh2/fa" # Verdict matrix: arch x diagnostic # 0 = ok (BP), 1 = ok-non-LN-arch, 2 = walk-back # Columns: (a) per-block growth, (b) ||g_L|| floor, (c) drift stability, (d) frozen baseline -# Rows: ResMLP-d256, ResMLP-d512, ViT-Mini, StudentNet (no LN), CNN (BN, no LN) -arches = ["ResMLP $d{=}256$\n(terminal LN)", "ResMLP $d{=}512$\n(terminal LN)", "ViT-Mini\n(cls + LN)", "StudentNet\n(no terminal LN)", "CNN BatchNorm\n(no terminal LN)"] +# Rows: ResMLP-d256, ResMLP-d512, ViT-Mini, no-terminal-LN ResMLP-d256, CNN (BN, no LN) +arches = ["ResMLP $d{=}256$\n(terminal LN)", "ResMLP $d{=}512$\n(terminal LN)", "ViT-Mini\n(cls + LN)", "ResMLP $d{=}256$\n(no terminal LN)", "CNN BatchNorm\n(no terminal LN)"] diags = ["(a) scale", "(b) ${\\|g\\|}$ floor", "(c) drift", "(d) frozen"] # DFA verdicts on each @@ -20,7 +20,7 @@ dfa = np.array([ [1, 1, 0, 1], # ResMLP d256: (a) fires, (b) fires, (c) noise sub-mode, (d) fires [1, 1, 0, 1], # ResMLP d512: same pattern [1, 1, 0, 1], # ViT-Mini: same pattern - [1, 0, 0, 0], # StudentNet: only (a) fires; (b) NEVER + [1, 0, 0, 0], # ResMLP no-LN: only (a) fires; (b) NEVER [1, 0, 0, 0], # CNN BN: only (a) fires; (b) NEVER (the killer (b)-is-LN-specific finding) ]) |
