diff options
| author | YurenHao0426 <Blackhao0426@gmail.com> | 2026-04-08 18:03:16 -0500 |
|---|---|---|
| committer | YurenHao0426 <Blackhao0426@gmail.com> | 2026-04-08 18:03:16 -0500 |
| commit | 752dfb833b06a6fb974df892de560caf328ed1dd (patch) | |
| tree | 4b29ea7737d7066c9ca33675f7061b89da2c55f2 /results/bp_nopen_30ep_3seed.log | |
| parent | 4aa9d89a28ad9d5a9cfe1ef685a5e100a648b4ed (diff) | |
paper v2.31: matched 30-epoch BP/DFA controls (was unsourced 0.609/0.308)
The §5 ¶3 BP-no-penalty value of 0.609 ± 0.004 and DFA-no-penalty value
of 0.308 ± 0.014 turned out to be unsourced — they were carried over
from a hardcoded comment in experiments/bp_with_penalty_control.py
("BP-trainable (3-seed mean): 0.609") that nobody had actually measured
with a matched 30-epoch run.
Ran the missing matched controls under the same recipe as BP+pen
(lam=0, 30 epochs, AdamW 1e-3, wd 0.01, cosine schedule, batch 128,
3 seeds 42/123/456):
BP no-pen 30ep: per-seed 0.5851, 0.5845, 0.5863 → 0.585 ± 0.001
(paper said 0.609 ± 0.004, off by 0.024)
DFA no-pen 30ep: per-seed 0.3070, 0.2985, 0.2966 → 0.301 ± 0.005
(paper said 0.308 ± 0.014)
Also re-grounded DFA+penalty 30ep using the dfa_pen_short 3-seed run
(0.3593, 0.3610, 0.3604 → 0.360 ± 0.001), which is what the deep-cosine
+0.155 figure was computed on. The paper had 0.363 ± 0.001 — that came
from the 100-epoch run, not the 30-epoch run, so it was an apples-to-
oranges comparison with BP+pen 30-ep.
Paper changes (§5 ¶3):
BP penalty cost: -8 pp → -5.5 pp
DFA pen rescue: +5.5 → +5.9 pp
DFA+pen margin vs frozen: +1.4 → +1.1 pp
BP-to-DFA gap: 17 → 17.0 pp (unchanged)
BP-to-SB gap: 7.7 → 7.7 pp (unchanged)
BP-to-DFA gap is still the lower-bound credit-quality cost claim;
17 pp gap is unchanged in magnitude.
Also updated:
- §5 ¶1 prose: 0.363 → 0.360, 0.308 → 0.301
- §4 ¶4 prose: DFA+pen 0.363 → 0.360
- Appendix J Table 9 caption: 0.363 → 0.360, +9.0 → +9.3 pp gap to SB
- Appendix L paragraph: +5.5 → +5.9 pp DFA penalty rescue
- Figure 3 panel C bar values + title pen-cost annotation
- New results/matched_30ep_control_summary.json as auditable record
Page layout preserved: 9 main pages + refs p10, 18 total, 0 overfull.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Diffstat (limited to 'results/bp_nopen_30ep_3seed.log')
| -rw-r--r-- | results/bp_nopen_30ep_3seed.log | 58 |
1 files changed, 58 insertions, 0 deletions
diff --git a/results/bp_nopen_30ep_3seed.log b/results/bp_nopen_30ep_3seed.log new file mode 100644 index 0000000..f88e725 --- /dev/null +++ b/results/bp_nopen_30ep_3seed.log @@ -0,0 +1,58 @@ +=== BP no-pen 30ep seed=42 === +BP + ‖f‖² penalty: seed=42, lam=0.0, epochs=30 + ep 1: test_acc=0.3920 + ep 5: test_acc=0.4950 + ep 10: test_acc=0.5296 + ep 15: test_acc=0.5539 + ep 20: test_acc=0.5721 + ep 25: test_acc=0.5813 + ep 30: test_acc=0.5851 + +FINAL test acc: 0.5851 +Compare to: + BP-trainable (3-seed mean): 0.609 + Penalized DFA lam=1e-2: 0.363 + DFA-shallow: 0.349 + +Margin vs DFA-shallow baseline: +23.61 pp + → BP+penalty intermediate; partial capacity loss + residual mode 2 +Saved results/bp_no_penalty_30ep/bp_pen_lam0.0_s42.json +=== BP no-pen 30ep seed=123 === +BP + ‖f‖² penalty: seed=123, lam=0.0, epochs=30 + ep 1: test_acc=0.3958 + ep 5: test_acc=0.4851 + ep 10: test_acc=0.5345 + ep 15: test_acc=0.5564 + ep 20: test_acc=0.5748 + ep 25: test_acc=0.5827 + ep 30: test_acc=0.5845 + +FINAL test acc: 0.5845 +Compare to: + BP-trainable (3-seed mean): 0.609 + Penalized DFA lam=1e-2: 0.363 + DFA-shallow: 0.349 + +Margin vs DFA-shallow baseline: +23.55 pp + → BP+penalty intermediate; partial capacity loss + residual mode 2 +Saved results/bp_no_penalty_30ep/bp_pen_lam0.0_s123.json +=== BP no-pen 30ep seed=456 === +BP + ‖f‖² penalty: seed=456, lam=0.0, epochs=30 + ep 1: test_acc=0.3969 + ep 5: test_acc=0.4957 + ep 10: test_acc=0.5344 + ep 15: test_acc=0.5612 + ep 20: test_acc=0.5716 + ep 25: test_acc=0.5845 + ep 30: test_acc=0.5863 + +FINAL test acc: 0.5863 +Compare to: + BP-trainable (3-seed mean): 0.609 + Penalized DFA lam=1e-2: 0.363 + DFA-shallow: 0.349 + +Margin vs DFA-shallow baseline: +23.73 pp + → BP+penalty intermediate; partial capacity loss + residual mode 2 +Saved results/bp_no_penalty_30ep/bp_pen_lam0.0_s456.json +=== ALL DONE === |
