<feed xmlns='http://www.w3.org/2005/Atom'>
<title>faeval.git/results/bp_with_penalty/bp_pen_lam0.01_s123.json, branch master</title>
<subtitle>Unnamed repository; edit this file 'description' to name the repository.
</subtitle>
<link rel='alternate' type='text/html' href='https://git.blackhao.com/faeval.git/'/>
<entry>
<title>paper v2.32: BP+penalty multi-seeded (was single-seed s42)</title>
<updated>2026-04-09T00:11:40+00:00</updated>
<author>
<name>YurenHao0426</name>
<email>Blackhao0426@gmail.com</email>
</author>
<published>2026-04-09T00:11:40+00:00</published>
<link rel='alternate' type='text/html' href='https://git.blackhao.com/faeval.git/commit/?id=d022688ea9fcfcb81f900751ee92e35597ef19b8'/>
<id>d022688ea9fcfcb81f900751ee92e35597ef19b8</id>
<content type='text'>
The §5 ¶3 BP+penalty value (0.530, +18.1 pp margin) was single-seed s42.
Ran s123 and s456 to multi-seed it, matching the BP-no-pen 3-seed control.

3-seed BP+pen 30ep results (lam=0.01, AdamW lr=1e-3 wd=0.01, cosine, batch 128):
  s42:  0.5303, +18.13 pp vs frozen
  s123: 0.5262, +17.72 pp
  s456: 0.5397, +19.07 pp
  3-seed mean: 0.5321 ± 0.0057, +18.31 pp

Updates:
- §5 ¶3: BP+pen "0.530 (single seed)" → "0.532 ± 0.006" (3-seed)
- §5 ¶3: BP penalty cost -5.5 pp → -5.3 pp
- §5 ¶3: BP+pen margin +18.1 → +18.3 pp
- §5 ¶3: BP-to-DFA gap 17.0 → 17.2 pp
- §4 ¶4: BP+pen +18.1 → +18.3 pp comparison
- Figure 3 panel C bar values: BP with_pen 0.530 → 0.532
- Figure 3 panel C title: BP-pen-cost -5.5pp → -5.3pp

The +18.3 pp 3-seed mean is essentially the same as the s42 single-seed
+18.13 pp, so the headline conclusion (BP+pen far above frozen baseline,
huge gap vs DFA+pen) is unchanged. This commit removes the last
single-seed value labeled as a key control.

New auditable file: results/bp_with_penalty_3seed_summary.json

Page layout preserved: 9 pages main, refs p10, 0 overfull boxes.

Co-Authored-By: Claude Opus 4.6 (1M context) &lt;noreply@anthropic.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The §5 ¶3 BP+penalty value (0.530, +18.1 pp margin) was single-seed s42.
Ran s123 and s456 to multi-seed it, matching the BP-no-pen 3-seed control.

3-seed BP+pen 30ep results (lam=0.01, AdamW lr=1e-3 wd=0.01, cosine, batch 128):
  s42:  0.5303, +18.13 pp vs frozen
  s123: 0.5262, +17.72 pp
  s456: 0.5397, +19.07 pp
  3-seed mean: 0.5321 ± 0.0057, +18.31 pp

Updates:
- §5 ¶3: BP+pen "0.530 (single seed)" → "0.532 ± 0.006" (3-seed)
- §5 ¶3: BP penalty cost -5.5 pp → -5.3 pp
- §5 ¶3: BP+pen margin +18.1 → +18.3 pp
- §5 ¶3: BP-to-DFA gap 17.0 → 17.2 pp
- §4 ¶4: BP+pen +18.1 → +18.3 pp comparison
- Figure 3 panel C bar values: BP with_pen 0.530 → 0.532
- Figure 3 panel C title: BP-pen-cost -5.5pp → -5.3pp

The +18.3 pp 3-seed mean is essentially the same as the s42 single-seed
+18.13 pp, so the headline conclusion (BP+pen far above frozen baseline,
huge gap vs DFA+pen) is unchanged. This commit removes the last
single-seed value labeled as a key control.

New auditable file: results/bp_with_penalty_3seed_summary.json

Page layout preserved: 9 pages main, refs p10, 0 overfull boxes.

Co-Authored-By: Claude Opus 4.6 (1M context) &lt;noreply@anthropic.com&gt;
</pre>
</div>
</content>
</entry>
</feed>
