summaryrefslogtreecommitdiff
path: root/protocol
diff options
context:
space:
mode:
authorYurenHao0426 <Blackhao0426@gmail.com>2026-04-07 23:58:04 -0500
committerYurenHao0426 <Blackhao0426@gmail.com>2026-04-07 23:58:04 -0500
commitab1b783c7a4f3d586d082ba142d7c046453a310c (patch)
tree78a19b2baceea99f86f3608ce0d6e7f728649a78 /protocol
parentcbe851cf382a2af13037304afdd783214bad5c6b (diff)
CHECKLIST pitfall #6: layer-0 dominance is ResMLP-specific, not universal
Verified by extracting per-layer gamma_dfa from existing ViT-Mini snapshot JSON (3 seeds, final epoch). On ViT all 4 layers have per-layer cosine near zero (~0.001 with eps clamp); no layer dominates. Compare to ResMLP where layer 0 has +0.42 and layers 1-4 are essentially zero. The pitfall is real on ResMLP but the specific 'layer 0 dominates' framing doesn't generalize to ViT. Reframed as 'aggregation hides per-layer structure'; lesson is to always report per-layer values regardless of which architecture-specific pattern you might be hiding.
Diffstat (limited to 'protocol')
-rw-r--r--protocol/CHECKLIST.md22
1 files changed, 15 insertions, 7 deletions
diff --git a/protocol/CHECKLIST.md b/protocol/CHECKLIST.md
index 6302b32..ac36b5e 100644
--- a/protocol/CHECKLIST.md
+++ b/protocol/CHECKLIST.md
@@ -90,15 +90,23 @@ alignment.
If any of these are missing, your post-hoc Γ measurement is undefined.
Report `Γ = N/A` in your tables, NOT a fallback value of 1.0.
-## 6. Layer-0 dominates the headline Γ; deeper layers are ~0
+## 6. Per-layer Γ structure is hidden by aggregation (layer-0 dominance is one mode)
For DFA on a 4-block ResMLP, the headline Γ ≈ 0.10 is driven almost entirely
-by the embedding layer (Γ_layer0 ≈ 0.43). The block layers have Γ ≈ 0. A
-"mean over layers" summary statistic hides this. The same pattern likely
-holds for other FA-style methods.
-
-**Check**: always report per-layer Γ. A single average is misleading when
-one layer dominates.
+by the embedding layer (Γ_layer0 ≈ +0.42, layers 1-4 ≈ 0). The "mean over
+layers" summary statistic hides this structure.
+
+**This pattern is architecture-specific, not universal.** On ViT-Mini, the
+per-layer Γ is uniformly near zero across all layers (no single layer
+dominates). On ResMLP the dominance is real and severe. The general lesson
+is not "layer 0 always dominates" — it is that **the aggregation hides
+per-layer structure that depends on the input preprocessing and the
+architecture's interaction with random feedback `Bs`**.
+
+**Check**: always report per-layer Γ. A single average can mislead in
+either direction — by hiding dominance (ResMLP-style) or by averaging
+over layers that all measure essentially the same degenerate quantity
+(ViT-style).
## Suggested final-pass workflow