diff options
| author | YurenHao0426 <Blackhao0426@gmail.com> | 2026-03-27 16:39:17 -0500 |
|---|---|---|
| committer | YurenHao0426 <Blackhao0426@gmail.com> | 2026-03-27 16:39:17 -0500 |
| commit | 4d6e689fe6bfffef6db7a4650aec210cd3eeed5c (patch) | |
| tree | fa8b6d123a51bab4b17a07f787cc89e74584397f /NOTE.md | |
| parent | 65d97ad1ef4b552103420e6501655df192c98d57 (diff) | |
Add Phase 10A.8: freeze-with-decay confirms stale aux is main freeze failure cause;
alpha sweep shows perlayer_vector at alpha=0.75 matches full network
10A.8A: freeze_decay_to_000 recovers to 28.5% (vs 14.6% fixed freeze) — stale
high-weight aux is the primary cause of freeze crashes. But 28.5% < DFA 31.2%
confirms continuous trainability adds ~2.7% independent value.
10A.8B: Both perlayer_vector and random_trainable optimal at alpha=0.75.
perlayer_vector +1.1% vs random_trainable +0.8% — per-layer vector is
the minimal sufficient scaffold, no network needed.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Diffstat (limited to 'NOTE.md')
| -rw-r--r-- | NOTE.md | 26 |
1 files changed, 26 insertions, 0 deletions
@@ -623,8 +623,34 @@ Depth-awareness is the minimal requirement (constant_input fails). **Minimal mechanism**: continuously trainable, non-zero, depth-aware auxiliary perturbation. +### Phase 10A.8: Scaffold Dynamics + +**8A: Freeze with Decay** + +| Branch | final | diff | Key | +|--------|-------|------|-----| +| random_trainable_075 | 0.322 | +1.1% | reference | +| freeze1_fixed075 | 0.146 | -16.6% | stale aux at 75% kills | +| freeze1_decay_to_000 | **0.285** | **-2.7%** | decay to DFA recovers most | +| freeze5_decay_to_000 | 0.285 | -2.6% | same | + +Freeze failure is MOSTLY stale high-weight aux (decay_to_000 recovers to 28.5%). +But 28.5% < DFA 31.2% → continuous trainability adds ~2.7% additional value. + +**8B: Alpha Sweep** + +| Method | α=0.25 | α=0.50 | α=0.75 | α=0.90 | +|--------|--------|--------|--------|--------| +| perlayer_vector | +0.0% | +0.6% | **+1.1%** | -1.4% | +| random_trainable | +0.1% | +0.4% | **+0.8%** | -0.1% | + +Both methods optimal at α=0.75. perlayer_vector (+1.1%) ≈ random_trainable (+0.8%). +Per-layer vector is the minimal sufficient scaffold. + ### Experiment IDs (Phase 10) - `prefit_threshold/`: Phase 10A prefit threshold curve - `blend_dissection/`: Phase 10A.5 blend mechanism dissection - `structured_aux/`: Phase 10A.6 structured vs semantic auxiliary - `minimal_aux_compression/`: Phase 10A.7 minimal aux compression +- `freeze_with_decay/`: Phase 10A.8A freeze with decay +- `alpha_sweep_scaffold/`: Phase 10A.8B alpha sweep |
