diff options
Diffstat (limited to 'protocol')
| -rw-r--r-- | protocol/CHECKLIST.md | 22 |
1 files changed, 15 insertions, 7 deletions
diff --git a/protocol/CHECKLIST.md b/protocol/CHECKLIST.md index 6302b32..ac36b5e 100644 --- a/protocol/CHECKLIST.md +++ b/protocol/CHECKLIST.md @@ -90,15 +90,23 @@ alignment. If any of these are missing, your post-hoc Γ measurement is undefined. Report `Γ = N/A` in your tables, NOT a fallback value of 1.0. -## 6. Layer-0 dominates the headline Γ; deeper layers are ~0 +## 6. Per-layer Γ structure is hidden by aggregation (layer-0 dominance is one mode) For DFA on a 4-block ResMLP, the headline Γ ≈ 0.10 is driven almost entirely -by the embedding layer (Γ_layer0 ≈ 0.43). The block layers have Γ ≈ 0. A -"mean over layers" summary statistic hides this. The same pattern likely -holds for other FA-style methods. - -**Check**: always report per-layer Γ. A single average is misleading when -one layer dominates. +by the embedding layer (Γ_layer0 ≈ +0.42, layers 1-4 ≈ 0). The "mean over +layers" summary statistic hides this structure. + +**This pattern is architecture-specific, not universal.** On ViT-Mini, the +per-layer Γ is uniformly near zero across all layers (no single layer +dominates). On ResMLP the dominance is real and severe. The general lesson +is not "layer 0 always dominates" — it is that **the aggregation hides +per-layer structure that depends on the input preprocessing and the +architecture's interaction with random feedback `Bs`**. + +**Check**: always report per-layer Γ. A single average can mislead in +either direction — by hiding dominance (ResMLP-style) or by averaging +over layers that all measure essentially the same degenerate quantity +(ViT-style). ## Suggested final-pass workflow |
