diff options
Diffstat (limited to 'NOTE.md')
| -rw-r--r-- | NOTE.md | 26 |
1 files changed, 26 insertions, 0 deletions
@@ -623,8 +623,34 @@ Depth-awareness is the minimal requirement (constant_input fails). **Minimal mechanism**: continuously trainable, non-zero, depth-aware auxiliary perturbation. +### Phase 10A.8: Scaffold Dynamics + +**8A: Freeze with Decay** + +| Branch | final | diff | Key | +|--------|-------|------|-----| +| random_trainable_075 | 0.322 | +1.1% | reference | +| freeze1_fixed075 | 0.146 | -16.6% | stale aux at 75% kills | +| freeze1_decay_to_000 | **0.285** | **-2.7%** | decay to DFA recovers most | +| freeze5_decay_to_000 | 0.285 | -2.6% | same | + +Freeze failure is MOSTLY stale high-weight aux (decay_to_000 recovers to 28.5%). +But 28.5% < DFA 31.2% → continuous trainability adds ~2.7% additional value. + +**8B: Alpha Sweep** + +| Method | α=0.25 | α=0.50 | α=0.75 | α=0.90 | +|--------|--------|--------|--------|--------| +| perlayer_vector | +0.0% | +0.6% | **+1.1%** | -1.4% | +| random_trainable | +0.1% | +0.4% | **+0.8%** | -0.1% | + +Both methods optimal at α=0.75. perlayer_vector (+1.1%) ≈ random_trainable (+0.8%). +Per-layer vector is the minimal sufficient scaffold. + ### Experiment IDs (Phase 10) - `prefit_threshold/`: Phase 10A prefit threshold curve - `blend_dissection/`: Phase 10A.5 blend mechanism dissection - `structured_aux/`: Phase 10A.6 structured vs semantic auxiliary - `minimal_aux_compression/`: Phase 10A.7 minimal aux compression +- `freeze_with_decay/`: Phase 10A.8A freeze with decay +- `alpha_sweep_scaffold/`: Phase 10A.8B alpha sweep |
