summaryrefslogtreecommitdiff
path: root/protocol
diff options
context:
space:
mode:
authorYurenHao0426 <Blackhao0426@gmail.com>2026-04-07 23:44:58 -0500
committerYurenHao0426 <Blackhao0426@gmail.com>2026-04-07 23:44:58 -0500
commit05cd478cb45f78ccf89ab42918df9010cd534ede (patch)
tree4b05f497b21d85607e7c8e892faae5c33ae5df64 /protocol
parent4cd716757b50a1f4217a3ffdf8ee624c270b7a23 (diff)
EVIDENCE_SUMMARY: add §3.7 CNN cross-architecture audit results
Diffstat (limited to 'protocol')
-rw-r--r--protocol/EVIDENCE_SUMMARY.md22
1 files changed, 21 insertions, 1 deletions
diff --git a/protocol/EVIDENCE_SUMMARY.md b/protocol/EVIDENCE_SUMMARY.md
index 395d875..f784d2f 100644
--- a/protocol/EVIDENCE_SUMMARY.md
+++ b/protocol/EVIDENCE_SUMMARY.md
@@ -57,6 +57,26 @@ and the file or memory entry where the result is recorded.
| 4-method audit at d=512 | BP trustworthy on 3/3 seeds; DFA/SB/CB walked back on 3/3 (same as d=256) | `python -m protocol.examples.audit_d512` |
| Width effect | max-per-block growth is HIGHER at d=512 (6e3-7e4 vs ~1e3 at d=256) | (in d=512 output) |
+## §3.7 Cross-architecture: CNN (no terminal LN, BatchNorm)
+
+| method × 3-seed | acc | max/block growth | ‖g_3‖ | verdict |
+|---|---:|---:|---:|---|
+| BP CNN | 0.866 ± 0.003 | 1.31× | 4e-5 | trustworthy |
+| State Bridge CNN | 0.633 ± 0.005 | 2.40× | 2e-3 | trustworthy |
+| **DFA CNN** | **0.566 ± 0.021** | **237×** | 1e-3 | walk-back via (a) only |
+| EP CNN | 0.512 ± 0.023 | 11.6× | ~6.6e-1 | trustworthy |
+| Credit Bridge CNN | 0.325 ± 0.009 | 96× | 3e-3 | walk-back via (a) only |
+
+**Key**: diagnostic (b) NEVER fires on CNN. Without terminal LN, BP grad does
+not collapse below 1e-7. Combined with the StudentNet result, this shows
+(b) is causally specific to LN architectures. DFA CNN reaches 0.566 (much
+higher than DFA ResMLP 0.31 / DFA ViT 0.24), consistent with the
+literature: classical FA papers report DFA working on shallow CNNs but
+failing on modern Transformers — the protocol gives the mechanistic
+reason (catastrophic (a)+(b) on with-LN vs mild (a) only on without-LN).
+
+Reproduce: `python -m protocol.examples.audit_cnn`
+
## §4 Two failure modes
| evidence | result | reproduce |
@@ -124,7 +144,7 @@ shallow baseline; mechanism is necessary but not sufficient.
## Status of evidence
- §1 protocol package: **DONE**, committed
-- §2 audit findings: **DONE** for ResMLP at d=256 (3 seeds, single seed) and d=512 (3 seeds); ViT audit waiting on checkpoint training
+- §2 audit findings: **DONE** for ResMLP at d=256 (3 seeds), d=512 (3 seeds), and CNN (3 seeds, 5 methods). 11 method×architecture combinations × 3 seeds = 33 audited conditions. ViT audit waiting on checkpoint training
- §3 decision utility: **DONE**, ablation table + threshold sensitivity ready
- §3 hero figure: **DONE**
- §4 temporal validation: **DONE** for 3 architectures × 3 seeds (ResMLP, ViT, StudentNet)