<feed xmlns='http://www.w3.org/2005/Atom'>
<title>faeval.git/NOTE.md, branch master</title>
<subtitle>Unnamed repository; edit this file 'description' to name the repository.
</subtitle>
<link rel='alternate' type='text/html' href='https://git.blackhao.com/faeval.git/'/>
<entry>
<title>Update NOTE.md + EVIDENCE_SUMMARY.md with FA results (2026-04-23)</title>
<updated>2026-04-23T16:18:59+00:00</updated>
<author>
<name>YurenHao0426</name>
<email>Blackhao0426@gmail.com</email>
</author>
<published>2026-04-23T16:18:59+00:00</published>
<link rel='alternate' type='text/html' href='https://git.blackhao.com/faeval.git/commit/?id=5937af903fdcb473cb3dd39cd3d0a86c1dbe0a05'/>
<id>5937af903fdcb473cb3dd39cd3d0a86c1dbe0a05</id>
<content type='text'>
NOTE.md: added comprehensive current-status section at the top with
the full 6-method audit table (BP/FA/EP/DFA/CB/SB), FA vs DFA key
comparison, depth sweep, penalty rescue comparison, cross-method
functional triangulation, and open items. Old Phase 10A content kept
below as historical reference.

EVIDENCE_SUMMARY.md: added "Vanilla FA vs DFA" section with the
paper-changing finding (FA 0.401 ± 0.009 vs DFA 0.306 ± 0.008,
FA has genuine deep cos +0.33, no Mode 1(b) collapse) and the
d=512 depth sweep table.

Co-Authored-By: Claude Opus 4.6 (1M context) &lt;noreply@anthropic.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
NOTE.md: added comprehensive current-status section at the top with
the full 6-method audit table (BP/FA/EP/DFA/CB/SB), FA vs DFA key
comparison, depth sweep, penalty rescue comparison, cross-method
functional triangulation, and open items. Old Phase 10A content kept
below as historical reference.

EVIDENCE_SUMMARY.md: added "Vanilla FA vs DFA" section with the
paper-changing finding (FA 0.401 ± 0.009 vs DFA 0.306 ± 0.008,
FA has genuine deep cos +0.33, no Mode 1(b) collapse) and the
d=512 depth sweep table.

Co-Authored-By: Claude Opus 4.6 (1M context) &lt;noreply@anthropic.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Add Phase 10A.8C: 3-seed replication — scaffold gains are marginal</title>
<updated>2026-03-27T23:07:58+00:00</updated>
<author>
<name>YurenHao0426</name>
<email>Blackhao0426@gmail.com</email>
</author>
<published>2026-03-27T23:07:58+00:00</published>
<link rel='alternate' type='text/html' href='https://git.blackhao.com/faeval.git/commit/?id=2a230acd5ee3fa6605892d524badf281ba7e9cfd'/>
<id>2a230acd5ee3fa6605892d524badf281ba7e9cfd</id>
<content type='text'>
3-seed results (mean±std):
- DFA: 0.306±0.006
- perlayer_vector α=0.75: 0.304±0.006 (-0.2%, not significant)
- random_trainable α=0.75: 0.313±0.007 (+0.7%, marginal, error bars overlap)

Single-seed gains (+1.1% perlayer, +0.8% vec) do not robustly replicate.
The scaffold mechanism provides at best a marginal, statistically uncertain benefit.

Co-Authored-By: Claude Opus 4.6 (1M context) &lt;noreply@anthropic.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
3-seed results (mean±std):
- DFA: 0.306±0.006
- perlayer_vector α=0.75: 0.304±0.006 (-0.2%, not significant)
- random_trainable α=0.75: 0.313±0.007 (+0.7%, marginal, error bars overlap)

Single-seed gains (+1.1% perlayer, +0.8% vec) do not robustly replicate.
The scaffold mechanism provides at best a marginal, statistically uncertain benefit.

Co-Authored-By: Claude Opus 4.6 (1M context) &lt;noreply@anthropic.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Add Phase 10A.8: freeze-with-decay confirms stale aux is main freeze failure cause;</title>
<updated>2026-03-27T21:39:17+00:00</updated>
<author>
<name>YurenHao0426</name>
<email>Blackhao0426@gmail.com</email>
</author>
<published>2026-03-27T21:39:17+00:00</published>
<link rel='alternate' type='text/html' href='https://git.blackhao.com/faeval.git/commit/?id=4d6e689fe6bfffef6db7a4650aec210cd3eeed5c'/>
<id>4d6e689fe6bfffef6db7a4650aec210cd3eeed5c</id>
<content type='text'>
alpha sweep shows perlayer_vector at alpha=0.75 matches full network

10A.8A: freeze_decay_to_000 recovers to 28.5% (vs 14.6% fixed freeze) — stale
high-weight aux is the primary cause of freeze crashes. But 28.5% &lt; DFA 31.2%
confirms continuous trainability adds ~2.7% independent value.

10A.8B: Both perlayer_vector and random_trainable optimal at alpha=0.75.
perlayer_vector +1.1% vs random_trainable +0.8% — per-layer vector is
the minimal sufficient scaffold, no network needed.

Co-Authored-By: Claude Opus 4.6 (1M context) &lt;noreply@anthropic.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
alpha sweep shows perlayer_vector at alpha=0.75 matches full network

10A.8A: freeze_decay_to_000 recovers to 28.5% (vs 14.6% fixed freeze) — stale
high-weight aux is the primary cause of freeze crashes. But 28.5% &lt; DFA 31.2%
confirms continuous trainability adds ~2.7% independent value.

10A.8B: Both perlayer_vector and random_trainable optimal at alpha=0.75.
perlayer_vector +1.1% vs random_trainable +0.8% — per-layer vector is
the minimal sufficient scaffold, no network needed.

Co-Authored-By: Claude Opus 4.6 (1M context) &lt;noreply@anthropic.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Add Phase 10A.7: minimal aux compression — continuous trainability is essential</title>
<updated>2026-03-27T19:25:00+00:00</updated>
<author>
<name>YurenHao0426</name>
<email>Blackhao0426@gmail.com</email>
</author>
<published>2026-03-27T19:25:00+00:00</published>
<link rel='alternate' type='text/html' href='https://git.blackhao.com/faeval.git/commit/?id=65d97ad1ef4b552103420e6501655df192c98d57'/>
<id>65d97ad1ef4b552103420e6501655df192c98d57</id>
<content type='text'>
8-branch dissection:
- zero_target + normmatched both crash: non-zero direction necessary, not norm
- perlayer_vector: +0.7% (per-block trainable vector works, network not required)
- freeze_after_{1,5,10}: ALL crash to ~13-14% (continuous trainability essential)
- random_trainable: +1.0% (reference)

Minimal mechanism: continuously trainable, non-zero, depth-aware auxiliary perturbation.
Freezing at ANY point destroys the benefit entirely.

Co-Authored-By: Claude Opus 4.6 (1M context) &lt;noreply@anthropic.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
8-branch dissection:
- zero_target + normmatched both crash: non-zero direction necessary, not norm
- perlayer_vector: +0.7% (per-block trainable vector works, network not required)
- freeze_after_{1,5,10}: ALL crash to ~13-14% (continuous trainability essential)
- random_trainable: +1.0% (reference)

Minimal mechanism: continuously trainable, non-zero, depth-aware auxiliary perturbation.
Freezing at ANY point destroys the benefit entirely.

Co-Authored-By: Claude Opus 4.6 (1M context) &lt;noreply@anthropic.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Add Phase 10A.6: gain requires trainable depth-aware aux, not semantic credit</title>
<updated>2026-03-27T03:07:35+00:00</updated>
<author>
<name>YurenHao0426</name>
<email>Blackhao0426@gmail.com</email>
</author>
<published>2026-03-27T03:07:35+00:00</published>
<link rel='alternate' type='text/html' href='https://git.blackhao.com/faeval.git/commit/?id=b4e3cbeae6cb4cf4a4b69b84a475afcd7d7e9dbe'/>
<id>b4e3cbeae6cb4cf4a4b69b84a475afcd7d7e9dbe</id>
<content type='text'>
9-branch dissection results:
- zero_target crashes (-9.1%): aux must output non-zero
- constant_input neutral (+0.0%): needs at least depth info
- time_only works (+1.0%): h_l not needed, just depth index
- shuffled/fresh_random work (+1.3-1.4%): no semantic content needed
- prefit60_trainable ≈ random_trainable: prefit adds nothing
- All frozen branches crash: trainability is essential

Mechanism: depth-aware trainable auxiliary perturbation that diversifies
block-local updates. Not semantic credit, not pure trainability.

Co-Authored-By: Claude Opus 4.6 (1M context) &lt;noreply@anthropic.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
9-branch dissection results:
- zero_target crashes (-9.1%): aux must output non-zero
- constant_input neutral (+0.0%): needs at least depth info
- time_only works (+1.0%): h_l not needed, just depth index
- shuffled/fresh_random work (+1.3-1.4%): no semantic content needed
- prefit60_trainable ≈ random_trainable: prefit adds nothing
- All frozen branches crash: trainability is essential

Mechanism: depth-aware trainable auxiliary perturbation that diversifies
block-local updates. Not semantic credit, not pure trainability.

Co-Authored-By: Claude Opus 4.6 (1M context) &lt;noreply@anthropic.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Add Phase 10A.5: blend gain is implicit regularization, not learned credit</title>
<updated>2026-03-26T21:27:53+00:00</updated>
<author>
<name>YurenHao0426</name>
<email>Blackhao0426@gmail.com</email>
</author>
<published>2026-03-26T21:27:53+00:00</published>
<link rel='alternate' type='text/html' href='https://git.blackhao.com/faeval.git/commit/?id=610e1169e19378cccd2d9b92a588c24dca7f3df7'/>
<id>610e1169e19378cccd2d9b92a588c24dca7f3df7</id>
<content type='text'>
Dissection of 6 branches from same DFA checkpoint:
- blend_random_frozen: 12.6% (CATASTROPHIC — frozen noise destroys training)
- blend_random_trainable: 32.2% (+1.2% — trainable network helps)
- blend_shuffled_trainable: 32.5% (+1.4% — even wrong targets work!)
- blend_gaussian_noise: 30.8% (neutral)
- scaled_DFA_norm_match: 31.0% (neutral)

The gain comes from implicit regularization through a co-optimized auxiliary
network, NOT from learned credit quality. Phase 9A's +1.5% was an optimization
dynamics effect, not evidence of useful credit assignment.

Co-Authored-By: Claude Opus 4.6 (1M context) &lt;noreply@anthropic.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Dissection of 6 branches from same DFA checkpoint:
- blend_random_frozen: 12.6% (CATASTROPHIC — frozen noise destroys training)
- blend_random_trainable: 32.2% (+1.2% — trainable network helps)
- blend_shuffled_trainable: 32.5% (+1.4% — even wrong targets work!)
- blend_gaussian_noise: 30.8% (neutral)
- scaled_DFA_norm_match: 31.0% (neutral)

The gain comes from implicit regularization through a co-optimized auxiliary
network, NOT from learned credit quality. Phase 9A's +1.5% was an optimization
dynamics effect, not evidence of useful credit assignment.

Co-Authored-By: Claude Opus 4.6 (1M context) &lt;noreply@anthropic.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Add Phase 10A: no prefit threshold — even random Vec blend beats DFA by +1.3%</title>
<updated>2026-03-26T13:37:39+00:00</updated>
<author>
<name>YurenHao0426</name>
<email>Blackhao0426@gmail.com</email>
</author>
<published>2026-03-26T13:37:39+00:00</published>
<link rel='alternate' type='text/html' href='https://git.blackhao.com/faeval.git/commit/?id=ef4aed70130e2212b4ed1cb7212e2ea6c7c7adb2'/>
<id>ef4aed70130e2212b4ed1cb7212e2ea6c7c7adb2</id>
<content type='text'>
E_prefit=0 (random Vec) + blend(0.75): 32.4% vs DFA 31.1% (+1.3%)
E_prefit=15: 32.3% (+1.2%)
E_prefit=60: 32.5% (+1.4%)

Frozen Gamma/rho near zero at all prefit levels. The Phase 9A success was NOT
from Vec learning useful credit — it was from the blend mechanism itself providing
regularization/diversification over pure DFA.

Co-Authored-By: Claude Opus 4.6 (1M context) &lt;noreply@anthropic.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
E_prefit=0 (random Vec) + blend(0.75): 32.4% vs DFA 31.1% (+1.3%)
E_prefit=15: 32.3% (+1.2%)
E_prefit=60: 32.5% (+1.4%)

Frozen Gamma/rho near zero at all prefit levels. The Phase 9A success was NOT
from Vec learning useful credit — it was from the blend mechanism itself providing
regularization/diversification over pure DFA.

Co-Authored-By: Claude Opus 4.6 (1M context) &lt;noreply@anthropic.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Add Phase 9B+9C: periodic refit fails, top-down curriculum neutral</title>
<updated>2026-03-26T05:07:01+00:00</updated>
<author>
<name>YurenHao0426</name>
<email>Blackhao0426@gmail.com</email>
</author>
<published>2026-03-26T05:07:01+00:00</published>
<link rel='alternate' type='text/html' href='https://git.blackhao.com/faeval.git/commit/?id=05ccd23154d1e9d090178b9d4d5f2c821711e784'/>
<id>05ccd23154d1e9d090178b9d4d5f2c821711e784</id>
<content type='text'>
Phase 9B (periodic refit K=5 R=1 alpha=0.75): 14.0% — Vec starts random,
periodic refits insufficient without offline pretraining.

Phase 9C (top-down curriculum): last1_vec=30.8%, last2_vec=31.1% vs DFA=31.2%.
Near-neutral. Cold-start problem persists even for single-block Vec.

Only Phase 9A's offline prefit + blend handoff (+1.5%) works.
The key ingredient is offline Vec training on frozen checkpoint features.

Co-Authored-By: Claude Opus 4.6 (1M context) &lt;noreply@anthropic.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Phase 9B (periodic refit K=5 R=1 alpha=0.75): 14.0% — Vec starts random,
periodic refits insufficient without offline pretraining.

Phase 9C (top-down curriculum): last1_vec=30.8%, last2_vec=31.1% vs DFA=31.2%.
Near-neutral. Cold-start problem persists even for single-block Vec.

Only Phase 9A's offline prefit + blend handoff (+1.5%) works.
The key ingredient is offline Vec training on frozen checkpoint features.

Co-Authored-By: Claude Opus 4.6 (1M context) &lt;noreply@anthropic.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Full Phase 9A: blend(0.75) outperforms DFA by +1.5% across multiple t0</title>
<updated>2026-03-26T04:03:32+00:00</updated>
<author>
<name>YurenHao0426</name>
<email>Blackhao0426@gmail.com</email>
</author>
<published>2026-03-26T04:03:32+00:00</published>
<link rel='alternate' type='text/html' href='https://git.blackhao.com/faeval.git/commit/?id=ccc6add69553893f6d3f9de4e2010ca8139ba1a6'/>
<id>ccc6add69553893f6d3f9de4e2010ca8139ba1a6</id>
<content type='text'>
Best configs (seed=42):
- t0=5, blend_075 (75%Vec+25%DFA): 32.6% vs DFA 31.0% (+1.5%)
- t0=10, blend_075: 32.5% vs 31.0% (+1.4%)
- t0=1, blend_05: 31.9% vs 31.0% (+0.9%)

Higher Vec fraction (0.75) consistently outperforms lower (0.25, 0.5) at t0&gt;=5.
Pure Vec handoff still fails at all checkpoints.

Co-Authored-By: Claude Opus 4.6 (1M context) &lt;noreply@anthropic.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Best configs (seed=42):
- t0=5, blend_075 (75%Vec+25%DFA): 32.6% vs DFA 31.0% (+1.5%)
- t0=10, blend_075: 32.5% vs 31.0% (+1.4%)
- t0=1, blend_05: 31.9% vs 31.0% (+0.9%)

Higher Vec fraction (0.75) consistently outperforms lower (0.25, 0.5) at t0&gt;=5.
Pure Vec handoff still fails at all checkpoints.

Co-Authored-By: Claude Opus 4.6 (1M context) &lt;noreply@anthropic.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Add Phase 9A: checkpointed handoff — blend(Vec+DFA) outperforms pure DFA</title>
<updated>2026-03-25T21:20:53+00:00</updated>
<author>
<name>YurenHao0426</name>
<email>Blackhao0426@gmail.com</email>
</author>
<published>2026-03-25T21:20:53+00:00</published>
<link rel='alternate' type='text/html' href='https://git.blackhao.com/faeval.git/commit/?id=5a3b20d627eca65612f598c1ba5807d5d2df029a'/>
<id>5a3b20d627eca65612f598c1ba5807d5d2df029a</id>
<content type='text'>
First positive online result: 50% blend of offline-fitted Vec + DFA gives 31.7%
vs 31.1% for pure DFA (+0.55%). This is Case B: pure Vec handoff fails (-1.1%)
but blend works because DFA stabilizes trajectory while Vec adds directional credit.

Offline-fitted Vec at DFA epoch-5 checkpoint: Gamma=0.229, rho=0.262.
Cold-start confirmed as main bottleneck — Vec IS useful on DFA trajectory features.

Co-Authored-By: Claude Opus 4.6 (1M context) &lt;noreply@anthropic.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
First positive online result: 50% blend of offline-fitted Vec + DFA gives 31.7%
vs 31.1% for pure DFA (+0.55%). This is Case B: pure Vec handoff fails (-1.1%)
but blend works because DFA stabilizes trajectory while Vec adds directional credit.

Offline-fitted Vec at DFA epoch-5 checkpoint: Gamma=0.229, rho=0.262.
Cold-start confirmed as main bottleneck — Vec IS useful on DFA trajectory features.

Co-Authored-By: Claude Opus 4.6 (1M context) &lt;noreply@anthropic.com&gt;
</pre>
</div>
</content>
</entry>
</feed>
