From 4d6e689fe6bfffef6db7a4650aec210cd3eeed5c Mon Sep 17 00:00:00 2001 From: YurenHao0426 Date: Fri, 27 Mar 2026 16:39:17 -0500 Subject: Add Phase 10A.8: freeze-with-decay confirms stale aux is main freeze failure cause; alpha sweep shows perlayer_vector at alpha=0.75 matches full network MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit 10A.8A: freeze_decay_to_000 recovers to 28.5% (vs 14.6% fixed freeze) — stale high-weight aux is the primary cause of freeze crashes. But 28.5% < DFA 31.2% confirms continuous trainability adds ~2.7% independent value. 10A.8B: Both perlayer_vector and random_trainable optimal at alpha=0.75. perlayer_vector +1.1% vs random_trainable +0.8% — per-layer vector is the minimal sufficient scaffold, no network needed. Co-Authored-By: Claude Opus 4.6 (1M context) --- NOTE.md | 26 ++++++++++++++++++++++++++ 1 file changed, 26 insertions(+) (limited to 'NOTE.md') diff --git a/NOTE.md b/NOTE.md index 2882547..90057af 100644 --- a/NOTE.md +++ b/NOTE.md @@ -623,8 +623,34 @@ Depth-awareness is the minimal requirement (constant_input fails). **Minimal mechanism**: continuously trainable, non-zero, depth-aware auxiliary perturbation. +### Phase 10A.8: Scaffold Dynamics + +**8A: Freeze with Decay** + +| Branch | final | diff | Key | +|--------|-------|------|-----| +| random_trainable_075 | 0.322 | +1.1% | reference | +| freeze1_fixed075 | 0.146 | -16.6% | stale aux at 75% kills | +| freeze1_decay_to_000 | **0.285** | **-2.7%** | decay to DFA recovers most | +| freeze5_decay_to_000 | 0.285 | -2.6% | same | + +Freeze failure is MOSTLY stale high-weight aux (decay_to_000 recovers to 28.5%). +But 28.5% < DFA 31.2% → continuous trainability adds ~2.7% additional value. + +**8B: Alpha Sweep** + +| Method | α=0.25 | α=0.50 | α=0.75 | α=0.90 | +|--------|--------|--------|--------|--------| +| perlayer_vector | +0.0% | +0.6% | **+1.1%** | -1.4% | +| random_trainable | +0.1% | +0.4% | **+0.8%** | -0.1% | + +Both methods optimal at α=0.75. perlayer_vector (+1.1%) ≈ random_trainable (+0.8%). +Per-layer vector is the minimal sufficient scaffold. + ### Experiment IDs (Phase 10) - `prefit_threshold/`: Phase 10A prefit threshold curve - `blend_dissection/`: Phase 10A.5 blend mechanism dissection - `structured_aux/`: Phase 10A.6 structured vs semantic auxiliary - `minimal_aux_compression/`: Phase 10A.7 minimal aux compression +- `freeze_with_decay/`: Phase 10A.8A freeze with decay +- `alpha_sweep_scaffold/`: Phase 10A.8B alpha sweep -- cgit v1.2.3