<feed xmlns='http://www.w3.org/2005/Atom'>
<title>faeval.git/experiments, branch master</title>
<subtitle>Unnamed repository; edit this file 'description' to name the repository.
</subtitle>
<link rel='alternate' type='text/html' href='https://git.blackhao.com/faeval.git/'/>
<entry>
<title>Add new experiment scripts, figures, and paper assets; untrack pyc/build artifacts</title>
<updated>2026-06-14T09:06:32+00:00</updated>
<author>
<name>YurenHao0426</name>
<email>Blackhao0426@gmail.com</email>
</author>
<published>2026-06-14T09:06:32+00:00</published>
<link rel='alternate' type='text/html' href='https://git.blackhao.com/faeval.git/commit/?id=aa73718eb6427d7da3b9cb416275802d90c4b2ed'/>
<id>aa73718eb6427d7da3b9cb416275802d90c4b2ed</id>
<content type='text'>
Co-Authored-By: Claude Opus 4.8 (1M context) &lt;noreply@anthropic.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Co-Authored-By: Claude Opus 4.8 (1M context) &lt;noreply@anthropic.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>BP+EP audit for d=512 L=2 qualifying seeds + CIFAR-100 support</title>
<updated>2026-04-26T14:31:30+00:00</updated>
<author>
<name>YurenHao0426</name>
<email>Blackhao0426@gmail.com</email>
</author>
<published>2026-04-26T14:31:30+00:00</published>
<link rel='alternate' type='text/html' href='https://git.blackhao.com/faeval.git/commit/?id=a501c1c84b6ac4ff7dbf2e4b92cebd3122eb7abe'/>
<id>a501c1c84b6ac4ff7dbf2e4b92cebd3122eb7abe</id>
<content type='text'>
BP results for qualifying seeds (1, 2, 5) on d=512 L=2:
  BP s1: 0.606, s2: 0.608, s5: 0.607 (all above frozen 0.349)
  FA s1: 0.347, s2: 0.346, s5: 0.341 (all below frozen, cos +0.47-0.49)
  DFA s1: 0.298, s2: 0.297, s5: 0.296 (all below frozen, cos +0.18-0.21)

EP did not save (likely architecture compatibility issue at d=512 L=2).

Also: added CIFAR-100 dataset support to both cifar_resmlp.py and
resmlp_frozen_blocks_baseline.py for the harder-task scan.

Co-Authored-By: Claude Opus 4.6 (1M context) &lt;noreply@anthropic.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
BP results for qualifying seeds (1, 2, 5) on d=512 L=2:
  BP s1: 0.606, s2: 0.608, s5: 0.607 (all above frozen 0.349)
  FA s1: 0.347, s2: 0.346, s5: 0.341 (all below frozen, cos +0.47-0.49)
  DFA s1: 0.298, s2: 0.297, s5: 0.296 (all below frozen, cos +0.18-0.21)

EP did not save (likely architecture compatibility issue at d=512 L=2).

Also: added CIFAR-100 dataset support to both cifar_resmlp.py and
resmlp_frozen_blocks_baseline.py for the harder-task scan.

Co-Authored-By: Claude Opus 4.6 (1M context) &lt;noreply@anthropic.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Find setting where both FA and DFA fail: d=512 L=2 ResMLP</title>
<updated>2026-04-26T13:45:34+00:00</updated>
<author>
<name>YurenHao0426</name>
<email>Blackhao0426@gmail.com</email>
</author>
<published>2026-04-26T13:45:34+00:00</published>
<link rel='alternate' type='text/html' href='https://git.blackhao.com/faeval.git/commit/?id=9751e97dd190b8667c337215dcb70e0cab8f92ff'/>
<id>9751e97dd190b8667c337215dcb70e0cab8f92ff</id>
<content type='text'>
TASK COMPLETE: Found 3/10 seeds where BOTH FA and DFA fall below
the frozen-blocks baseline while reporting positive cosine and
nontrivial accuracy — proving that the standard evaluation pair
can simultaneously miss both FA and DFA on the same setting.

Setting: d=512 L=2 pre-LayerNorm ResMLP, CIFAR-10, 100 epochs
Frozen baseline (3-seed mean): 0.349

Qualifying seeds:
  seed 1: DFA=0.298 (cos +0.206), FA=0.347 (cos +0.484)
  seed 2: DFA=0.297 (cos +0.179), FA=0.346 (cos +0.472)
  seed 5: DFA=0.296 (cos +0.194), FA=0.341 (cos +0.492)

All qualifying cases have:
  - Both methods below frozen baseline ✓
  - Both methods report positive aggregate cosine ✓
  - Both methods above chance (~0.10) ✓
  - Standard reporting pair (acc + Γ) would NOT walk back either ✓

DFA is below frozen in ALL 10/10 seeds (mean 0.300 ± 0.009).
FA is below frozen in 3/10 seeds (mean across all 10: 0.370 ± 0.026).

Also includes:
- Frozen baselines for d=512 at L=2,4,8,12 × 3 seeds (12 runs)
- resmlp_frozen_blocks_baseline.py patched with --num_blocks arg

Co-Authored-By: Claude Opus 4.6 (1M context) &lt;noreply@anthropic.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
TASK COMPLETE: Found 3/10 seeds where BOTH FA and DFA fall below
the frozen-blocks baseline while reporting positive cosine and
nontrivial accuracy — proving that the standard evaluation pair
can simultaneously miss both FA and DFA on the same setting.

Setting: d=512 L=2 pre-LayerNorm ResMLP, CIFAR-10, 100 epochs
Frozen baseline (3-seed mean): 0.349

Qualifying seeds:
  seed 1: DFA=0.298 (cos +0.206), FA=0.347 (cos +0.484)
  seed 2: DFA=0.297 (cos +0.179), FA=0.346 (cos +0.472)
  seed 5: DFA=0.296 (cos +0.194), FA=0.341 (cos +0.492)

All qualifying cases have:
  - Both methods below frozen baseline ✓
  - Both methods report positive aggregate cosine ✓
  - Both methods above chance (~0.10) ✓
  - Standard reporting pair (acc + Γ) would NOT walk back either ✓

DFA is below frozen in ALL 10/10 seeds (mean 0.300 ± 0.009).
FA is below frozen in 3/10 seeds (mean across all 10: 0.370 ± 0.026).

Also includes:
- Frozen baselines for d=512 at L=2,4,8,12 × 3 seeds (12 runs)
- resmlp_frozen_blocks_baseline.py patched with --num_blocks arg

Co-Authored-By: Claude Opus 4.6 (1M context) &lt;noreply@anthropic.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Add vanilla FA (Lillicrap 2016) implementation + full experiment suite</title>
<updated>2026-04-23T04:46:33+00:00</updated>
<author>
<name>YurenHao0426</name>
<email>Blackhao0426@gmail.com</email>
</author>
<published>2026-04-23T04:46:33+00:00</published>
<link rel='alternate' type='text/html' href='https://git.blackhao.com/faeval.git/commit/?id=05c935ab03ee0bdb8597d19466192dfb92ee889d'/>
<id>05c935ab03ee0bdb8597d19466192dfb92ee889d</id>
<content type='text'>
PAPER-CHANGING FINDING: FA is dramatically different from DFA on the
same architecture. FA has genuine deep credit quality where DFA has none.

Implementation:
- experiments/cifar_resmlp.py: added train_fa() + FA diagnostic support
  FA uses sequential backward credit propagation with d×d random matrices
  (a_l = B_l @ a_{l+1}) instead of DFA's direct output-error projection
  (a_l = B_l^T @ e_T). Same local loss form &lt;f_l, a_l&gt;.

Core results (A-H, 100ep 3-seed d=256 terminal-LN ResMLP):

  FA main audit:    0.401 ± 0.009 (DFA: 0.306 ± 0.008)  +9.5 pp
  FA vs frozen:     +5.2 pp ABOVE baseline (DFA: -4.3 pp below)
  FA deep cos:      +0.33 (DFA: ~0 degenerate)
  FA ||h_L||:       ~10^5 (DFA: ~5×10^8)  3 OOM less growth
  FA ||g_L||:       ~10^-6 meaningful (DFA: ~10^-10 floor)
  Mode 1(b) fires:  NO for FA; YES for DFA

  FA+pen lam=1e-2:  0.369 ± 0.003 (DFA+pen: 0.360 ± 0.002)
  FA+pen lam=1e-4:  0.377 ± 0.006 (DFA+pen lam=1e-4: 0.360)
    At lam=1e-4, FA already has deep cos +0.30 while DFA has -0.02

  FA random-target: acc 0.12 (chance), h_L=1.3e5 (DFA: 1.7e8)
  FA early 5ep:     deep cos already +0.32 (DFA ep1: -0.008)

Extension results (d=512 depth sweep, 100ep, s42):
  L=2:  FA 0.350, cos +0.96  (DFA: n/a)
  L=4:  FA 0.424, cos +0.29  (DFA: n/a)
  L=6:  FA 0.401, cos +0.16  (DFA: n/a)
  L=8:  FA 0.409, cos +0.11  (DFA: 0.306, cos -0.0001)
  L=12: FA 0.404, cos +0.09  (DFA: 0.309, cos -0.0001)

FA deep cos is positive at EVERY depth; DFA is ~0 everywhere.
FA accuracy exceeds DFA by 5-10 pp at L=8 and L=12.

This is the strongest empirical support for the Mode 2 → Mode 1
hypothesis: same local loss, same architecture, same optimizer —
only the credit signal differs. FA's sequential propagation produces
much better per-layer credit (cos +0.33 vs ~0), which prevents the
catastrophic activation growth that DFA exhibits.

Co-Authored-By: Claude Opus 4.6 (1M context) &lt;noreply@anthropic.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
PAPER-CHANGING FINDING: FA is dramatically different from DFA on the
same architecture. FA has genuine deep credit quality where DFA has none.

Implementation:
- experiments/cifar_resmlp.py: added train_fa() + FA diagnostic support
  FA uses sequential backward credit propagation with d×d random matrices
  (a_l = B_l @ a_{l+1}) instead of DFA's direct output-error projection
  (a_l = B_l^T @ e_T). Same local loss form &lt;f_l, a_l&gt;.

Core results (A-H, 100ep 3-seed d=256 terminal-LN ResMLP):

  FA main audit:    0.401 ± 0.009 (DFA: 0.306 ± 0.008)  +9.5 pp
  FA vs frozen:     +5.2 pp ABOVE baseline (DFA: -4.3 pp below)
  FA deep cos:      +0.33 (DFA: ~0 degenerate)
  FA ||h_L||:       ~10^5 (DFA: ~5×10^8)  3 OOM less growth
  FA ||g_L||:       ~10^-6 meaningful (DFA: ~10^-10 floor)
  Mode 1(b) fires:  NO for FA; YES for DFA

  FA+pen lam=1e-2:  0.369 ± 0.003 (DFA+pen: 0.360 ± 0.002)
  FA+pen lam=1e-4:  0.377 ± 0.006 (DFA+pen lam=1e-4: 0.360)
    At lam=1e-4, FA already has deep cos +0.30 while DFA has -0.02

  FA random-target: acc 0.12 (chance), h_L=1.3e5 (DFA: 1.7e8)
  FA early 5ep:     deep cos already +0.32 (DFA ep1: -0.008)

Extension results (d=512 depth sweep, 100ep, s42):
  L=2:  FA 0.350, cos +0.96  (DFA: n/a)
  L=4:  FA 0.424, cos +0.29  (DFA: n/a)
  L=6:  FA 0.401, cos +0.16  (DFA: n/a)
  L=8:  FA 0.409, cos +0.11  (DFA: 0.306, cos -0.0001)
  L=12: FA 0.404, cos +0.09  (DFA: 0.309, cos -0.0001)

FA deep cos is positive at EVERY depth; DFA is ~0 everywhere.
FA accuracy exceeds DFA by 5-10 pp at L=8 and L=12.

This is the strongest empirical support for the Mode 2 → Mode 1
hypothesis: same local loss, same architecture, same optimizer —
only the credit signal differs. FA's sequential propagation produces
much better per-layer credit (cos +0.33 vs ~0), which prevents the
catastrophic activation growth that DFA exhibits.

Co-Authored-By: Claude Opus 4.6 (1M context) &lt;noreply@anthropic.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Sync experiment+protocol scripts with v2.32 corrected control values</title>
<updated>2026-04-09T00:24:06+00:00</updated>
<author>
<name>YurenHao0426</name>
<email>Blackhao0426@gmail.com</email>
</author>
<published>2026-04-09T00:24:06+00:00</published>
<link rel='alternate' type='text/html' href='https://git.blackhao.com/faeval.git/commit/?id=2fa24acae8bb7f8c026db2f7fdade4a29b640d8d'/>
<id>2fa24acae8bb7f8c026db2f7fdade4a29b640d8d</id>
<content type='text'>
The pre-v2.31 unsourced values BP=0.609 and DFA=0.308 (which v2.31 fixed
to 0.585 and 0.301 via matched 30-ep controls) were also hardcoded as
"compare to" comments in 5 helper scripts:

  experiments/bp_with_penalty_control.py
  experiments/dfa_residual_penalty_test.py
  experiments/resmlp_frozen_blocks_baseline.py
  protocol/examples/threshold_d_sensitivity.py
  protocol/examples/plot_penalty_rescue.py

These are non-paper-input scripts (their output goes to stdout, not to
the paper), so the stale values didn't cause numerical errors in the
paper itself. But the original v2.31 BP+pen=0.609 unsourced number bug
came from exactly this kind of hardcoded "for-comparison" comment that
was never measured. Updating them now to remove the same trap from
future runs.

Each script now references the matched 30-ep 3-seed values from
results/bp_no_penalty_30ep, results/dfa_no_penalty_30ep, results/
dfa_pen_short, and results/bp_with_penalty.

protocol/EVIDENCE_SUMMARY.md and PAPER_OUTLINE.md still have stale
numbers — these are project scratch documents and not user-facing.
Deferred to a separate sweep if needed.

Co-Authored-By: Claude Opus 4.6 (1M context) &lt;noreply@anthropic.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The pre-v2.31 unsourced values BP=0.609 and DFA=0.308 (which v2.31 fixed
to 0.585 and 0.301 via matched 30-ep controls) were also hardcoded as
"compare to" comments in 5 helper scripts:

  experiments/bp_with_penalty_control.py
  experiments/dfa_residual_penalty_test.py
  experiments/resmlp_frozen_blocks_baseline.py
  protocol/examples/threshold_d_sensitivity.py
  protocol/examples/plot_penalty_rescue.py

These are non-paper-input scripts (their output goes to stdout, not to
the paper), so the stale values didn't cause numerical errors in the
paper itself. But the original v2.31 BP+pen=0.609 unsourced number bug
came from exactly this kind of hardcoded "for-comparison" comment that
was never measured. Updating them now to remove the same trap from
future runs.

Each script now references the matched 30-ep 3-seed values from
results/bp_no_penalty_30ep, results/dfa_no_penalty_30ep, results/
dfa_pen_short, and results/bp_with_penalty.

protocol/EVIDENCE_SUMMARY.md and PAPER_OUTLINE.md still have stale
numbers — these are project scratch documents and not user-facing.
Deferred to a separate sweep if needed.

Co-Authored-By: Claude Opus 4.6 (1M context) &lt;noreply@anthropic.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Round 38: add --penalty_lam flag to cifar_resmlp.py for Mode 2 cross-method test</title>
<updated>2026-04-08T11:37:23+00:00</updated>
<author>
<name>YurenHao0426</name>
<email>Blackhao0426@gmail.com</email>
</author>
<published>2026-04-08T11:37:23+00:00</published>
<link rel='alternate' type='text/html' href='https://git.blackhao.com/faeval.git/commit/?id=b4d276f6a4b20c7766e0bceb687e42ecd4869fef'/>
<id>b4d276f6a4b20c7766e0bceb687e42ecd4869fef</id>
<content type='text'>
Patches:
- main(): add --penalty_lam (separate from CB's bridge temperature args.lam)
- train_dfa block update (line 195): add penalty_lam * (f_l**2).sum(-1).mean()
- train_state_bridge block update (line 326): same penalty
- train_credit_bridge block update (line 533): same penalty

Codex round 38 GO STAGE: keep penalty separate from CB lam, blocks-only,
sanity-check that hidden_norms remain nontrivial (not silencing the blocks).

2-epoch smoke (results/round38_smoke_sbcb_pen) passes the silencing check:
SB ||h_L||=229, CB ||h_L||=1258, both nontrivial. Deep cosines positive across
all layers for SB ([0.28, 0.25, 0.23]) and rising for CB ([0.04, 0.08, 0.13, 0.15]).

Co-Authored-By: Claude Opus 4.6 (1M context) &lt;noreply@anthropic.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Patches:
- main(): add --penalty_lam (separate from CB's bridge temperature args.lam)
- train_dfa block update (line 195): add penalty_lam * (f_l**2).sum(-1).mean()
- train_state_bridge block update (line 326): same penalty
- train_credit_bridge block update (line 533): same penalty

Codex round 38 GO STAGE: keep penalty separate from CB lam, blocks-only,
sanity-check that hidden_norms remain nontrivial (not silencing the blocks).

2-epoch smoke (results/round38_smoke_sbcb_pen) passes the silencing check:
SB ||h_L||=229, CB ||h_L||=1258, both nontrivial. Deep cosines positive across
all layers for SB ([0.28, 0.25, 0.23]) and rising for CB ([0.04, 0.08, 0.13, 0.15]).

Co-Authored-By: Claude Opus 4.6 (1M context) &lt;noreply@anthropic.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Round 36: upgrade (b) wording + add EP random-target neg control to §3</title>
<updated>2026-04-08T11:11:25+00:00</updated>
<author>
<name>YurenHao0426</name>
<email>Blackhao0426@gmail.com</email>
</author>
<published>2026-04-08T11:11:25+00:00</published>
<link rel='alternate' type='text/html' href='https://git.blackhao.com/faeval.git/commit/?id=02c3d2c80805daedb2b6c8e9d6e5f36c52d361a1'/>
<id>02c3d2c80805daedb2b6c8e9d6e5f36c52d361a1</id>
<content type='text'>
Two changes from round 36:
1. §3 paragraph 3: replace 'observational association' with full causal claim
   based on existing April 7 no-out_ln data (3 seeds, ResMLP-d256+terminal-LN
   removed, residual skip kept): ||h_L||=1.21e7 (Mode 1 (a) still fires) but
   ||g_L||=7.4e-4 (HEALTHY, ~10000x above floor — (b) eliminated). Final acc
   0.327±0.013 indistinguishable from vanilla DFA's 0.308±0.014. Wording
   upgraded to 'terminal LayerNorm is necessary for Mode 1(b) in the audited
   residual ResMLP and ViT-Mini setting'.

2. §3 paragraph after random-target ablation: add EP under random targets
   smoke result (||h_L||=586 at ep 5 vs DFA's 14510 at ep 3, 25x gap).
   Random-target assay now cleanly separates fixed-feedback methods (explode)
   from EP (bounded). Cross-method negative control complete.

- experiments/ep_baseline.py: add --random_targets flag + train_ep parameter
- v2.5 paper compiles to 15 pages, main content 1-9 (right at E&amp;D limit)

Combined picture (rounds 32-36):
- Mode 1 (a) localized to 'fixed-feedback local-credit objectives without
  scale control on architectures absorbing scale at output'. Falsified:
  residual skip (round 33), task signal (round 34), DFA-specific (round 35).
  EP is the working negative control (round 36).
- Mode 1 (b) localized to terminal LayerNorm via the 1/||h|| Jacobian.
  Causally established by April 7 no_outln 3-seed data.

Co-Authored-By: Claude Opus 4.6 (1M context) &lt;noreply@anthropic.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Two changes from round 36:
1. §3 paragraph 3: replace 'observational association' with full causal claim
   based on existing April 7 no-out_ln data (3 seeds, ResMLP-d256+terminal-LN
   removed, residual skip kept): ||h_L||=1.21e7 (Mode 1 (a) still fires) but
   ||g_L||=7.4e-4 (HEALTHY, ~10000x above floor — (b) eliminated). Final acc
   0.327±0.013 indistinguishable from vanilla DFA's 0.308±0.014. Wording
   upgraded to 'terminal LayerNorm is necessary for Mode 1(b) in the audited
   residual ResMLP and ViT-Mini setting'.

2. §3 paragraph after random-target ablation: add EP under random targets
   smoke result (||h_L||=586 at ep 5 vs DFA's 14510 at ep 3, 25x gap).
   Random-target assay now cleanly separates fixed-feedback methods (explode)
   from EP (bounded). Cross-method negative control complete.

- experiments/ep_baseline.py: add --random_targets flag + train_ep parameter
- v2.5 paper compiles to 15 pages, main content 1-9 (right at E&amp;D limit)

Combined picture (rounds 32-36):
- Mode 1 (a) localized to 'fixed-feedback local-credit objectives without
  scale control on architectures absorbing scale at output'. Falsified:
  residual skip (round 33), task signal (round 34), DFA-specific (round 35).
  EP is the working negative control (round 36).
- Mode 1 (b) localized to terminal LayerNorm via the 1/||h|| Jacobian.
  Causally established by April 7 no_outln 3-seed data.

Co-Authored-By: Claude Opus 4.6 (1M context) &lt;noreply@anthropic.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Round 35: SB and CB also show data-agnostic Mode 1 growth on random targets</title>
<updated>2026-04-08T10:57:53+00:00</updated>
<author>
<name>YurenHao0426</name>
<email>Blackhao0426@gmail.com</email>
</author>
<published>2026-04-08T10:57:53+00:00</published>
<link rel='alternate' type='text/html' href='https://git.blackhao.com/faeval.git/commit/?id=be39c2b5ebec37f993b1a862459455a98cf39eb2'/>
<id>be39c2b5ebec37f993b1a862459455a98cf39eb2</id>
<content type='text'>
- experiments/cifar_resmlp.py: add --methods filter and --random_targets flag;
  extend compute_diagnostics to log hidden_norms_per_layer and bp_grad_norms_per_layer
- paper/main.tex §3 ¶1: broaden random-target finding to all 3 fixed-feedback methods
  (DFA: ||h_L||=14510, SB: ||h_L||=6225, CB: ||h_L||=19974 at ep 3, all at chance acc)
- paper/main.tex Appendix J: extended with cross-method smoke-test table

This generalizes the §3 mechanism story from 'DFA-specific' to 'all 3 audited
fixed-feedback local-credit methods'. Combined with rounds 32-34, the proximate
cause of Mode 1 (a) is now well-localized:
  - Not requires residual skip (round 33 H2 walkback)
  - Not requires task signal (round 34 random targets, DFA)
  - Not DFA-specific (round 35 random targets, SB+CB)

Co-Authored-By: Claude Opus 4.6 (1M context) &lt;noreply@anthropic.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
- experiments/cifar_resmlp.py: add --methods filter and --random_targets flag;
  extend compute_diagnostics to log hidden_norms_per_layer and bp_grad_norms_per_layer
- paper/main.tex §3 ¶1: broaden random-target finding to all 3 fixed-feedback methods
  (DFA: ||h_L||=14510, SB: ||h_L||=6225, CB: ||h_L||=19974 at ep 3, all at chance acc)
- paper/main.tex Appendix J: extended with cross-method smoke-test table

This generalizes the §3 mechanism story from 'DFA-specific' to 'all 3 audited
fixed-feedback local-credit methods'. Combined with rounds 32-34, the proximate
cause of Mode 1 (a) is now well-localized:
  - Not requires residual skip (round 33 H2 walkback)
  - Not requires task signal (round 34 random targets, DFA)
  - Not DFA-specific (round 35 random targets, SB+CB)

Co-Authored-By: Claude Opus 4.6 (1M context) &lt;noreply@anthropic.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Round 34 random-target ablation: Mode 1 fires under random labels too</title>
<updated>2026-04-08T10:47:47+00:00</updated>
<author>
<name>YurenHao0426</name>
<email>Blackhao0426@gmail.com</email>
</author>
<published>2026-04-08T10:47:47+00:00</published>
<link rel='alternate' type='text/html' href='https://git.blackhao.com/faeval.git/commit/?id=52693a9be4349c2820ac79e3e3d9af53813a7412'/>
<id>52693a9be4349c2820ac79e3e3d9af53813a7412</id>
<content type='text'>
Codex round 34 picked OPTION A (i.i.d. random class targets per minibatch) over the
analytic-only OPTION D as the most discriminating test of 'is (a) intrinsic to DFA
update geometry or task-driven?'. Smoke test result is unambiguous:

  ep 0: ||h_L||=8.9    ||g_L||=9.8e-4
  ep 1: ||h_L||=1616   ||g_L||=5.1e-6
  ep 2: ||h_L||=9768   ||g_L||=8.5e-7
  ep 3: ||h_L||=14510  ||g_L||=5.6e-7   (test acc still at chance ~0.07)

Three orders of magnitude growth in ||h_L|| in 3 epochs, three orders of magnitude
collapse in ||g_L|| in the same 3 epochs, with NO task signal whatsoever — DFA's
local-loss geometry is the proximate driver, not data adaptation.

- experiments/snapshot_evolution_residual_explosion.py: add --random_targets and
  --skip_bp flags
- paper/main.tex §3 ¶1: replace 'no explicit scale constraint' framing with codex
  round 34's 6-line geometric argument and the random-target empirical falsifier
- paper/main.tex Appendix J: full smoke-test table + interpretation
- v2.3: 14 pages total, main content still 8 pages

Co-Authored-By: Claude Opus 4.6 (1M context) &lt;noreply@anthropic.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Codex round 34 picked OPTION A (i.i.d. random class targets per minibatch) over the
analytic-only OPTION D as the most discriminating test of 'is (a) intrinsic to DFA
update geometry or task-driven?'. Smoke test result is unambiguous:

  ep 0: ||h_L||=8.9    ||g_L||=9.8e-4
  ep 1: ||h_L||=1616   ||g_L||=5.1e-6
  ep 2: ||h_L||=9768   ||g_L||=8.5e-7
  ep 3: ||h_L||=14510  ||g_L||=5.6e-7   (test acc still at chance ~0.07)

Three orders of magnitude growth in ||h_L|| in 3 epochs, three orders of magnitude
collapse in ||g_L|| in the same 3 epochs, with NO task signal whatsoever — DFA's
local-loss geometry is the proximate driver, not data adaptation.

- experiments/snapshot_evolution_residual_explosion.py: add --random_targets and
  --skip_bp flags
- paper/main.tex §3 ¶1: replace 'no explicit scale constraint' framing with codex
  round 34's 6-line geometric argument and the random-target empirical falsifier
- paper/main.tex Appendix J: full smoke-test table + interpretation
- v2.3: 14 pages total, main content still 8 pages

Co-Authored-By: Claude Opus 4.6 (1M context) &lt;noreply@anthropic.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Round 32+33 H2 ablation: add no_residual_add flag; falsify residual-as-cause hypothesis</title>
<updated>2026-04-08T10:39:39+00:00</updated>
<author>
<name>YurenHao0426</name>
<email>Blackhao0426@gmail.com</email>
</author>
<published>2026-04-08T10:39:39+00:00</published>
<link rel='alternate' type='text/html' href='https://git.blackhao.com/faeval.git/commit/?id=8dd65b2ec3df32749adabbf62c55101d5b00ae7b'/>
<id>8dd65b2ec3df32749adabbf62c55101d5b00ae7b</id>
<content type='text'>
- models/residual_mlp.py: add residual_add and w2_std flags (default unchanged)
- experiments/snapshot_evolution_residual_explosion.py: add --no_residual_add and --w2_std CLI flags
- paper/main.tex §3 ¶3: add 1-sentence reference to no-residual control showing Mode 1 still fires
- paper/main.tex Appendix I: full smoke-test table + interpretation
- v2.2 main content stays at 8 pages (within 9-page E&amp;D budget); 13 pages total

Smoke test (3 ep, w2_std=0.5, seed 42):
- DFA no-residual: ||h_L|| 4.69 -&gt; 22050, ||g|| 1.6e-7 (Mode 1 (a) fires; (b) at floor)
- BP no-residual: acc only 0.16 at ep 3 (architecture is partially degenerate)
- Conclusion: residual skip is NOT necessary for Mode 1; the proximate trigger is more general
- Codex round 33 verdict: WALK BACK H2; demote 100ep run to confirmatory

Co-Authored-By: Claude Opus 4.6 (1M context) &lt;noreply@anthropic.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
- models/residual_mlp.py: add residual_add and w2_std flags (default unchanged)
- experiments/snapshot_evolution_residual_explosion.py: add --no_residual_add and --w2_std CLI flags
- paper/main.tex §3 ¶3: add 1-sentence reference to no-residual control showing Mode 1 still fires
- paper/main.tex Appendix I: full smoke-test table + interpretation
- v2.2 main content stays at 8 pages (within 9-page E&amp;D budget); 13 pages total

Smoke test (3 ep, w2_std=0.5, seed 42):
- DFA no-residual: ||h_L|| 4.69 -&gt; 22050, ||g|| 1.6e-7 (Mode 1 (a) fires; (b) at floor)
- BP no-residual: acc only 0.16 at ep 3 (architecture is partially degenerate)
- Conclusion: residual skip is NOT necessary for Mode 1; the proximate trigger is more general
- Codex round 33 verdict: WALK BACK H2; demote 100ep run to confirmatory

Co-Authored-By: Claude Opus 4.6 (1M context) &lt;noreply@anthropic.com&gt;
</pre>
</div>
</content>
</entry>
</feed>
