<feed xmlns='http://www.w3.org/2005/Atom'>
<title>faeval.git/experiments/resmlp_frozen_blocks_baseline.py, branch master</title>
<subtitle>Unnamed repository; edit this file 'description' to name the repository.
</subtitle>
<link rel='alternate' type='text/html' href='https://git.blackhao.com/faeval.git/'/>
<entry>
<title>BP+EP audit for d=512 L=2 qualifying seeds + CIFAR-100 support</title>
<updated>2026-04-26T14:31:30+00:00</updated>
<author>
<name>YurenHao0426</name>
<email>Blackhao0426@gmail.com</email>
</author>
<published>2026-04-26T14:31:30+00:00</published>
<link rel='alternate' type='text/html' href='https://git.blackhao.com/faeval.git/commit/?id=a501c1c84b6ac4ff7dbf2e4b92cebd3122eb7abe'/>
<id>a501c1c84b6ac4ff7dbf2e4b92cebd3122eb7abe</id>
<content type='text'>
BP results for qualifying seeds (1, 2, 5) on d=512 L=2:
  BP s1: 0.606, s2: 0.608, s5: 0.607 (all above frozen 0.349)
  FA s1: 0.347, s2: 0.346, s5: 0.341 (all below frozen, cos +0.47-0.49)
  DFA s1: 0.298, s2: 0.297, s5: 0.296 (all below frozen, cos +0.18-0.21)

EP did not save (likely architecture compatibility issue at d=512 L=2).

Also: added CIFAR-100 dataset support to both cifar_resmlp.py and
resmlp_frozen_blocks_baseline.py for the harder-task scan.

Co-Authored-By: Claude Opus 4.6 (1M context) &lt;noreply@anthropic.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
BP results for qualifying seeds (1, 2, 5) on d=512 L=2:
  BP s1: 0.606, s2: 0.608, s5: 0.607 (all above frozen 0.349)
  FA s1: 0.347, s2: 0.346, s5: 0.341 (all below frozen, cos +0.47-0.49)
  DFA s1: 0.298, s2: 0.297, s5: 0.296 (all below frozen, cos +0.18-0.21)

EP did not save (likely architecture compatibility issue at d=512 L=2).

Also: added CIFAR-100 dataset support to both cifar_resmlp.py and
resmlp_frozen_blocks_baseline.py for the harder-task scan.

Co-Authored-By: Claude Opus 4.6 (1M context) &lt;noreply@anthropic.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Find setting where both FA and DFA fail: d=512 L=2 ResMLP</title>
<updated>2026-04-26T13:45:34+00:00</updated>
<author>
<name>YurenHao0426</name>
<email>Blackhao0426@gmail.com</email>
</author>
<published>2026-04-26T13:45:34+00:00</published>
<link rel='alternate' type='text/html' href='https://git.blackhao.com/faeval.git/commit/?id=9751e97dd190b8667c337215dcb70e0cab8f92ff'/>
<id>9751e97dd190b8667c337215dcb70e0cab8f92ff</id>
<content type='text'>
TASK COMPLETE: Found 3/10 seeds where BOTH FA and DFA fall below
the frozen-blocks baseline while reporting positive cosine and
nontrivial accuracy — proving that the standard evaluation pair
can simultaneously miss both FA and DFA on the same setting.

Setting: d=512 L=2 pre-LayerNorm ResMLP, CIFAR-10, 100 epochs
Frozen baseline (3-seed mean): 0.349

Qualifying seeds:
  seed 1: DFA=0.298 (cos +0.206), FA=0.347 (cos +0.484)
  seed 2: DFA=0.297 (cos +0.179), FA=0.346 (cos +0.472)
  seed 5: DFA=0.296 (cos +0.194), FA=0.341 (cos +0.492)

All qualifying cases have:
  - Both methods below frozen baseline ✓
  - Both methods report positive aggregate cosine ✓
  - Both methods above chance (~0.10) ✓
  - Standard reporting pair (acc + Γ) would NOT walk back either ✓

DFA is below frozen in ALL 10/10 seeds (mean 0.300 ± 0.009).
FA is below frozen in 3/10 seeds (mean across all 10: 0.370 ± 0.026).

Also includes:
- Frozen baselines for d=512 at L=2,4,8,12 × 3 seeds (12 runs)
- resmlp_frozen_blocks_baseline.py patched with --num_blocks arg

Co-Authored-By: Claude Opus 4.6 (1M context) &lt;noreply@anthropic.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
TASK COMPLETE: Found 3/10 seeds where BOTH FA and DFA fall below
the frozen-blocks baseline while reporting positive cosine and
nontrivial accuracy — proving that the standard evaluation pair
can simultaneously miss both FA and DFA on the same setting.

Setting: d=512 L=2 pre-LayerNorm ResMLP, CIFAR-10, 100 epochs
Frozen baseline (3-seed mean): 0.349

Qualifying seeds:
  seed 1: DFA=0.298 (cos +0.206), FA=0.347 (cos +0.484)
  seed 2: DFA=0.297 (cos +0.179), FA=0.346 (cos +0.472)
  seed 5: DFA=0.296 (cos +0.194), FA=0.341 (cos +0.492)

All qualifying cases have:
  - Both methods below frozen baseline ✓
  - Both methods report positive aggregate cosine ✓
  - Both methods above chance (~0.10) ✓
  - Standard reporting pair (acc + Γ) would NOT walk back either ✓

DFA is below frozen in ALL 10/10 seeds (mean 0.300 ± 0.009).
FA is below frozen in 3/10 seeds (mean across all 10: 0.370 ± 0.026).

Also includes:
- Frozen baselines for d=512 at L=2,4,8,12 × 3 seeds (12 runs)
- resmlp_frozen_blocks_baseline.py patched with --num_blocks arg

Co-Authored-By: Claude Opus 4.6 (1M context) &lt;noreply@anthropic.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Sync experiment+protocol scripts with v2.32 corrected control values</title>
<updated>2026-04-09T00:24:06+00:00</updated>
<author>
<name>YurenHao0426</name>
<email>Blackhao0426@gmail.com</email>
</author>
<published>2026-04-09T00:24:06+00:00</published>
<link rel='alternate' type='text/html' href='https://git.blackhao.com/faeval.git/commit/?id=2fa24acae8bb7f8c026db2f7fdade4a29b640d8d'/>
<id>2fa24acae8bb7f8c026db2f7fdade4a29b640d8d</id>
<content type='text'>
The pre-v2.31 unsourced values BP=0.609 and DFA=0.308 (which v2.31 fixed
to 0.585 and 0.301 via matched 30-ep controls) were also hardcoded as
"compare to" comments in 5 helper scripts:

  experiments/bp_with_penalty_control.py
  experiments/dfa_residual_penalty_test.py
  experiments/resmlp_frozen_blocks_baseline.py
  protocol/examples/threshold_d_sensitivity.py
  protocol/examples/plot_penalty_rescue.py

These are non-paper-input scripts (their output goes to stdout, not to
the paper), so the stale values didn't cause numerical errors in the
paper itself. But the original v2.31 BP+pen=0.609 unsourced number bug
came from exactly this kind of hardcoded "for-comparison" comment that
was never measured. Updating them now to remove the same trap from
future runs.

Each script now references the matched 30-ep 3-seed values from
results/bp_no_penalty_30ep, results/dfa_no_penalty_30ep, results/
dfa_pen_short, and results/bp_with_penalty.

protocol/EVIDENCE_SUMMARY.md and PAPER_OUTLINE.md still have stale
numbers — these are project scratch documents and not user-facing.
Deferred to a separate sweep if needed.

Co-Authored-By: Claude Opus 4.6 (1M context) &lt;noreply@anthropic.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The pre-v2.31 unsourced values BP=0.609 and DFA=0.308 (which v2.31 fixed
to 0.585 and 0.301 via matched 30-ep controls) were also hardcoded as
"compare to" comments in 5 helper scripts:

  experiments/bp_with_penalty_control.py
  experiments/dfa_residual_penalty_test.py
  experiments/resmlp_frozen_blocks_baseline.py
  protocol/examples/threshold_d_sensitivity.py
  protocol/examples/plot_penalty_rescue.py

These are non-paper-input scripts (their output goes to stdout, not to
the paper), so the stale values didn't cause numerical errors in the
paper itself. But the original v2.31 BP+pen=0.609 unsourced number bug
came from exactly this kind of hardcoded "for-comparison" comment that
was never measured. Updating them now to remove the same trap from
future runs.

Each script now references the matched 30-ep 3-seed values from
results/bp_no_penalty_30ep, results/dfa_no_penalty_30ep, results/
dfa_pen_short, and results/bp_with_penalty.

protocol/EVIDENCE_SUMMARY.md and PAPER_OUTLINE.md still have stale
numbers — these are project scratch documents and not user-facing.
Deferred to a separate sweep if needed.

Co-Authored-By: Claude Opus 4.6 (1M context) &lt;noreply@anthropic.com&gt;
</pre>
</div>
</content>
</entry>
</feed>
