diff options
| author | YurenHao0426 <Blackhao0426@gmail.com> | 2026-04-07 23:58:04 -0500 |
|---|---|---|
| committer | YurenHao0426 <Blackhao0426@gmail.com> | 2026-04-07 23:58:04 -0500 |
| commit | ab1b783c7a4f3d586d082ba142d7c046453a310c (patch) | |
| tree | 78a19b2baceea99f86f3608ce0d6e7f728649a78 /results/confirmatory/clean_sparsity/synth_bp_s123_a0.0_L4.json | |
| parent | cbe851cf382a2af13037304afdd783214bad5c6b (diff) | |
CHECKLIST pitfall #6: layer-0 dominance is ResMLP-specific, not universal
Verified by extracting per-layer gamma_dfa from existing ViT-Mini snapshot
JSON (3 seeds, final epoch). On ViT all 4 layers have per-layer cosine
near zero (~0.001 with eps clamp); no layer dominates. Compare to ResMLP
where layer 0 has +0.42 and layers 1-4 are essentially zero.
The pitfall is real on ResMLP but the specific 'layer 0 dominates' framing
doesn't generalize to ViT. Reframed as 'aggregation hides per-layer
structure'; lesson is to always report per-layer values regardless of
which architecture-specific pattern you might be hiding.
Diffstat (limited to 'results/confirmatory/clean_sparsity/synth_bp_s123_a0.0_L4.json')
0 files changed, 0 insertions, 0 deletions
