summaryrefslogtreecommitdiff
path: root/protocol/EVIDENCE_SUMMARY.md
diff options
context:
space:
mode:
authorYurenHao0426 <Blackhao0426@gmail.com>2026-04-08 19:25:42 -0500
committerYurenHao0426 <Blackhao0426@gmail.com>2026-04-08 19:25:42 -0500
commita0b8169afb7981921e6599f2bc33a35a0ab9ca53 (patch)
tree764df334339963041e2e9effe7c51afa98571d0e /protocol/EVIDENCE_SUMMARY.md
parent2fa24acae8bb7f8c026db2f7fdade4a29b640d8d (diff)
Sync EVIDENCE_SUMMARY.md and PAPER_OUTLINE.md with v2.32 values
These two project scratch documents had stale BP=0.609 and DFA=0.308 references from the pre-v2.31 era. Updated to the matched 30-ep 3-seed values that v2.31-v2.32 corrected: BP no-pen 30ep: 0.609 → 0.585 ± 0.001 BP+pen 30ep: 0.530 → 0.532 ± 0.006 DFA no-pen 30ep: 0.308 → 0.301 ± 0.005 DFA+pen 30ep: 0.363 → 0.360 ± 0.001 Gap math: +5.5/-8 → +5.9/-5.3 pp; +18.1/+1.4 → +18.3/+1.1 pp Deep cos: +0.155 → +0.151 Now the paper, the protocol library, the README, the helper scripts, and the project scratch docs all agree on the v2.32 values. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Diffstat (limited to 'protocol/EVIDENCE_SUMMARY.md')
-rw-r--r--protocol/EVIDENCE_SUMMARY.md28
1 files changed, 15 insertions, 13 deletions
diff --git a/protocol/EVIDENCE_SUMMARY.md b/protocol/EVIDENCE_SUMMARY.md
index 2591c97..bab8764 100644
--- a/protocol/EVIDENCE_SUMMARY.md
+++ b/protocol/EVIDENCE_SUMMARY.md
@@ -122,12 +122,12 @@ on deep layers. **Caught by direct per-layer cosine measurement.**
| 456 | 0.364 | 4.1e4 | 9.0e-7 | +0.139 |
| **mean** | **0.363 ± 0.001** | **4.0e4** | **9.0e-7** | **+0.151 ± 0.012** |
-### BP+penalty 2×2 grid (raw acc, primary number per round 20)
+### BP+penalty 2×2 grid (matched 30-epoch 3-seed values, paper v2.32)
| | no penalty | with penalty | penalty effect |
|---|---:|---:|---:|
-| BP | 0.609 | **0.530** | −8 pp (capacity loss) |
-| DFA | 0.308 | 0.363 | +5.5 pp (rescue) |
+| BP | 0.585 ± 0.001 | **0.532 ± 0.006** | −5.3 pp (capacity loss) |
+| DFA | 0.301 ± 0.005 | 0.360 ± 0.001 | +5.9 pp (rescue) |
### Round 20 phrasing for the gap
@@ -170,19 +170,21 @@ All 6 bugs from `protocol/CHECKLIST.md` have a reproducer:
| ViT-Mini (4-block d=128) | yes | 0.26 | 0.80 | (a)+(b) ep 1-3 | never |
| StudentNet (4-block d=128) | **no** | 0.33 | 0.62 | (a) ep 18, **(b) NEVER** | never |
-### Penalty rescue (3-seed for λ=1e-2, single-seed for others)
+### Penalty rescue (matched 30-epoch 3-seed values, paper v2.32)
| condition | acc | ‖h_L‖ | ‖g_L‖ |
|---|---:|---:|---:|
-| DFA-vanilla | 0.308 ± 0.014 | 4.4e8 | 5e-10 |
-| DFA + λ=1e-3 ‖f‖² | 0.372 (1 seed) | 4.0e4 | 7e-7 |
-| DFA + λ=1e-2 ‖f‖² | 0.363 ± 0.001 | 3.8e4 | 1e-6 |
-| DFA + λ=1e-1 ‖f‖² | (running) | (running) | (running) |
-| DFA-shallow baseline | 0.349 ± 0.002 | (n/a) | (n/a) |
-| BP-trainable | 0.609 ± 0.004 | 2.0e2 | 5e-5 |
-
-The penalty rescues by +5.5 pp over vanilla DFA but only +1.4 pp over the
-shallow baseline; mechanism is necessary but not sufficient.
+| DFA-vanilla 30ep (3-seed) | 0.301 ± 0.005 | 4.4e8 (s42) / 5e8 (3-seed mean) | 4e-10 |
+| DFA + λ=1e-4 ‖f‖² 30ep (3-seed) | 0.360 | 2.2e4 | 7e-7 |
+| DFA + λ=1e-2 ‖f‖² 30ep (3-seed) | 0.360 ± 0.001 | 1.3e4 | 1.6e-6 |
+| DFA + λ=1e-1 ‖f‖² 30ep (s42) | 0.349 | 1.2e4 | 1.6e-6 |
+| DFA-shallow baseline (frozen) | 0.349 ± 0.002 | (n/a) | (n/a) |
+| BP-trainable 30ep (3-seed) | 0.585 ± 0.001 | (n/a) | (n/a) |
+| BP-trainable 100ep (3-seed) | 0.6147 ± 0.004 | 2.0e2 | 5e-5 |
+| BP+pen λ=1e-2 30ep (3-seed) | 0.532 ± 0.006 | 4.0e4 | (matches DFA+pen) |
+
+The penalty rescues by +5.9 pp over vanilla DFA (matched 30-ep) but only
++1.1 pp over the shallow baseline; mechanism is necessary but not sufficient.
## Figures (paper-ready)