faeval.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author
2026-04-08	paper v2.32: BP+penalty multi-seeded (was single-seed s42)	YurenHao0426
	The §5 ¶3 BP+penalty value (0.530, +18.1 pp margin) was single-seed s42. Ran s123 and s456 to multi-seed it, matching the BP-no-pen 3-seed control. 3-seed BP+pen 30ep results (lam=0.01, AdamW lr=1e-3 wd=0.01, cosine, batch 128): s42: 0.5303, +18.13 pp vs frozen s123: 0.5262, +17.72 pp s456: 0.5397, +19.07 pp 3-seed mean: 0.5321 ± 0.0057, +18.31 pp Updates: - §5 ¶3: BP+pen "0.530 (single seed)" → "0.532 ± 0.006" (3-seed) - §5 ¶3: BP penalty cost -5.5 pp → -5.3 pp - §5 ¶3: BP+pen margin +18.1 → +18.3 pp - §5 ¶3: BP-to-DFA gap 17.0 → 17.2 pp - §4 ¶4: BP+pen +18.1 → +18.3 pp comparison - Figure 3 panel C bar values: BP with_pen 0.530 → 0.532 - Figure 3 panel C title: BP-pen-cost -5.5pp → -5.3pp The +18.3 pp 3-seed mean is essentially the same as the s42 single-seed +18.13 pp, so the headline conclusion (BP+pen far above frozen baseline, huge gap vs DFA+pen) is unchanged. This commit removes the last single-seed value labeled as a key control. New auditable file: results/bp_with_penalty_3seed_summary.json Page layout preserved: 9 pages main, refs p10, 0 overfull boxes. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08	BP+penalty control result: mode 2 (intrinsic credit quality) confirmed REAL	YurenHao0426
	BP + lam=1e-2 \|\|f\|\|^2 penalty trained for 30 epochs (s42): ep 30 final: test_acc 0.5303 margin vs DFA-shallow 0.349: +18.13 pp The 2x2 accuracy grid: no penalty with penalty BP 0.609 0.530 DFA 0.308 0.363 Penalty effect on BP: -8 pp (capacity regularization cost) Penalty effect on DFA: +5.5 pp (rescue from active harm) Mode 2 (intrinsic credit quality) is confirmed REAL by this control: even after the penalty's capacity cost, BP achieves +18 pp depth utilization. DFA under the same penalty achieves only +1.4 pp. The difference (~17 pp) cannot be attributed to capacity loss — it is genuine credit-quality cost of random feedback vs true backprop gradient. This validates the round 19 'two distinct failure modes' framing: mode 2 is not a penalty-induced regularization artifact.