diff options
| author | YurenHao0426 <Blackhao0426@gmail.com> | 2026-04-07 23:44:58 -0500 |
|---|---|---|
| committer | YurenHao0426 <Blackhao0426@gmail.com> | 2026-04-07 23:44:58 -0500 |
| commit | 05cd478cb45f78ccf89ab42918df9010cd534ede (patch) | |
| tree | 4b05f497b21d85607e7c8e892faae5c33ae5df64 /protocol | |
| parent | 4cd716757b50a1f4217a3ffdf8ee624c270b7a23 (diff) | |
EVIDENCE_SUMMARY: add §3.7 CNN cross-architecture audit results
Diffstat (limited to 'protocol')
| -rw-r--r-- | protocol/EVIDENCE_SUMMARY.md | 22 |
1 files changed, 21 insertions, 1 deletions
diff --git a/protocol/EVIDENCE_SUMMARY.md b/protocol/EVIDENCE_SUMMARY.md index 395d875..f784d2f 100644 --- a/protocol/EVIDENCE_SUMMARY.md +++ b/protocol/EVIDENCE_SUMMARY.md @@ -57,6 +57,26 @@ and the file or memory entry where the result is recorded. | 4-method audit at d=512 | BP trustworthy on 3/3 seeds; DFA/SB/CB walked back on 3/3 (same as d=256) | `python -m protocol.examples.audit_d512` | | Width effect | max-per-block growth is HIGHER at d=512 (6e3-7e4 vs ~1e3 at d=256) | (in d=512 output) | +## §3.7 Cross-architecture: CNN (no terminal LN, BatchNorm) + +| method × 3-seed | acc | max/block growth | ‖g_3‖ | verdict | +|---|---:|---:|---:|---| +| BP CNN | 0.866 ± 0.003 | 1.31× | 4e-5 | trustworthy | +| State Bridge CNN | 0.633 ± 0.005 | 2.40× | 2e-3 | trustworthy | +| **DFA CNN** | **0.566 ± 0.021** | **237×** | 1e-3 | walk-back via (a) only | +| EP CNN | 0.512 ± 0.023 | 11.6× | ~6.6e-1 | trustworthy | +| Credit Bridge CNN | 0.325 ± 0.009 | 96× | 3e-3 | walk-back via (a) only | + +**Key**: diagnostic (b) NEVER fires on CNN. Without terminal LN, BP grad does +not collapse below 1e-7. Combined with the StudentNet result, this shows +(b) is causally specific to LN architectures. DFA CNN reaches 0.566 (much +higher than DFA ResMLP 0.31 / DFA ViT 0.24), consistent with the +literature: classical FA papers report DFA working on shallow CNNs but +failing on modern Transformers — the protocol gives the mechanistic +reason (catastrophic (a)+(b) on with-LN vs mild (a) only on without-LN). + +Reproduce: `python -m protocol.examples.audit_cnn` + ## §4 Two failure modes | evidence | result | reproduce | @@ -124,7 +144,7 @@ shallow baseline; mechanism is necessary but not sufficient. ## Status of evidence - §1 protocol package: **DONE**, committed -- §2 audit findings: **DONE** for ResMLP at d=256 (3 seeds, single seed) and d=512 (3 seeds); ViT audit waiting on checkpoint training +- §2 audit findings: **DONE** for ResMLP at d=256 (3 seeds), d=512 (3 seeds), and CNN (3 seeds, 5 methods). 11 method×architecture combinations × 3 seeds = 33 audited conditions. ViT audit waiting on checkpoint training - §3 decision utility: **DONE**, ablation table + threshold sensitivity ready - §3 hero figure: **DONE** - §4 temporal validation: **DONE** for 3 architectures × 3 seeds (ResMLP, ViT, StudentNet) |
