<feed xmlns='http://www.w3.org/2005/Atom'>
<title>faeval.git/protocol/examples/plot_penalty_rescue.py, branch master</title>
<subtitle>Unnamed repository; edit this file 'description' to name the repository.
</subtitle>
<link rel='alternate' type='text/html' href='https://git.blackhao.com/faeval.git/'/>
<entry>
<title>Sync experiment+protocol scripts with v2.32 corrected control values</title>
<updated>2026-04-09T00:24:06+00:00</updated>
<author>
<name>YurenHao0426</name>
<email>Blackhao0426@gmail.com</email>
</author>
<published>2026-04-09T00:24:06+00:00</published>
<link rel='alternate' type='text/html' href='https://git.blackhao.com/faeval.git/commit/?id=2fa24acae8bb7f8c026db2f7fdade4a29b640d8d'/>
<id>2fa24acae8bb7f8c026db2f7fdade4a29b640d8d</id>
<content type='text'>
The pre-v2.31 unsourced values BP=0.609 and DFA=0.308 (which v2.31 fixed
to 0.585 and 0.301 via matched 30-ep controls) were also hardcoded as
"compare to" comments in 5 helper scripts:

  experiments/bp_with_penalty_control.py
  experiments/dfa_residual_penalty_test.py
  experiments/resmlp_frozen_blocks_baseline.py
  protocol/examples/threshold_d_sensitivity.py
  protocol/examples/plot_penalty_rescue.py

These are non-paper-input scripts (their output goes to stdout, not to
the paper), so the stale values didn't cause numerical errors in the
paper itself. But the original v2.31 BP+pen=0.609 unsourced number bug
came from exactly this kind of hardcoded "for-comparison" comment that
was never measured. Updating them now to remove the same trap from
future runs.

Each script now references the matched 30-ep 3-seed values from
results/bp_no_penalty_30ep, results/dfa_no_penalty_30ep, results/
dfa_pen_short, and results/bp_with_penalty.

protocol/EVIDENCE_SUMMARY.md and PAPER_OUTLINE.md still have stale
numbers — these are project scratch documents and not user-facing.
Deferred to a separate sweep if needed.

Co-Authored-By: Claude Opus 4.6 (1M context) &lt;noreply@anthropic.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The pre-v2.31 unsourced values BP=0.609 and DFA=0.308 (which v2.31 fixed
to 0.585 and 0.301 via matched 30-ep controls) were also hardcoded as
"compare to" comments in 5 helper scripts:

  experiments/bp_with_penalty_control.py
  experiments/dfa_residual_penalty_test.py
  experiments/resmlp_frozen_blocks_baseline.py
  protocol/examples/threshold_d_sensitivity.py
  protocol/examples/plot_penalty_rescue.py

These are non-paper-input scripts (their output goes to stdout, not to
the paper), so the stale values didn't cause numerical errors in the
paper itself. But the original v2.31 BP+pen=0.609 unsourced number bug
came from exactly this kind of hardcoded "for-comparison" comment that
was never measured. Updating them now to remove the same trap from
future runs.

Each script now references the matched 30-ep 3-seed values from
results/bp_no_penalty_30ep, results/dfa_no_penalty_30ep, results/
dfa_pen_short, and results/bp_with_penalty.

protocol/EVIDENCE_SUMMARY.md and PAPER_OUTLINE.md still have stale
numbers — these are project scratch documents and not user-facing.
Deferred to a separate sweep if needed.

Co-Authored-By: Claude Opus 4.6 (1M context) &lt;noreply@anthropic.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Add §4 penalty rescue figure: visual two-failure-modes story</title>
<updated>2026-04-08T04:32:44+00:00</updated>
<author>
<name>YurenHao0426</name>
<email>Blackhao0426@gmail.com</email>
</author>
<published>2026-04-08T04:32:44+00:00</published>
<link rel='alternate' type='text/html' href='https://git.blackhao.com/faeval.git/commit/?id=d5185a3cc692fe96c93bbc5d7b286b7080ba7458'/>
<id>d5185a3cc692fe96c93bbc5d7b286b7080ba7458</id>
<content type='text'>
3-panel side-by-side showing per-epoch trajectories of vanilla DFA vs
DFA + lambda*||f||^2 penalty:

  (a) ||h_L||:    vanilla 4e8 vs penalty 4e4 (4 OOM rescue)
  (b) ||g_L||:    vanilla 5e-10 vs penalty ~1e-6 (4 OOM rescue)
  (d) test acc:   vanilla 0.31 vs penalty 0.36 vs frozen baseline 0.349 vs BP 0.61

The visual story: (a) and (b) show the penalty pulling the diagnostics
back into the healthy regime, but (d) shows the rescue translates to
only +1 pp above the DFA-shallow baseline and 24 pp below BP-trainable.
The two failure modes (scale + direction) are visually separable: scale
is fixed, direction is not.

Together with figure_audit_5method.png and figure_cross_arch_temporal_s42.png,
this is the third paper-ready figure for §3-§4.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
3-panel side-by-side showing per-epoch trajectories of vanilla DFA vs
DFA + lambda*||f||^2 penalty:

  (a) ||h_L||:    vanilla 4e8 vs penalty 4e4 (4 OOM rescue)
  (b) ||g_L||:    vanilla 5e-10 vs penalty ~1e-6 (4 OOM rescue)
  (d) test acc:   vanilla 0.31 vs penalty 0.36 vs frozen baseline 0.349 vs BP 0.61

The visual story: (a) and (b) show the penalty pulling the diagnostics
back into the healthy regime, but (d) shows the rescue translates to
only +1 pp above the DFA-shallow baseline and 24 pp below BP-trainable.
The two failure modes (scale + direction) are visually separable: scale
is fixed, direction is not.

Together with figure_audit_5method.png and figure_cross_arch_temporal_s42.png,
this is the third paper-ready figure for §3-§4.
</pre>
</div>
</content>
</entry>
</feed>
