summaryrefslogtreecommitdiff
path: root/NOTE.md
diff options
context:
space:
mode:
Diffstat (limited to 'NOTE.md')
-rw-r--r--NOTE.md26
1 files changed, 26 insertions, 0 deletions
diff --git a/NOTE.md b/NOTE.md
index 2882547..90057af 100644
--- a/NOTE.md
+++ b/NOTE.md
@@ -623,8 +623,34 @@ Depth-awareness is the minimal requirement (constant_input fails).
**Minimal mechanism**: continuously trainable, non-zero, depth-aware auxiliary perturbation.
+### Phase 10A.8: Scaffold Dynamics
+
+**8A: Freeze with Decay**
+
+| Branch | final | diff | Key |
+|--------|-------|------|-----|
+| random_trainable_075 | 0.322 | +1.1% | reference |
+| freeze1_fixed075 | 0.146 | -16.6% | stale aux at 75% kills |
+| freeze1_decay_to_000 | **0.285** | **-2.7%** | decay to DFA recovers most |
+| freeze5_decay_to_000 | 0.285 | -2.6% | same |
+
+Freeze failure is MOSTLY stale high-weight aux (decay_to_000 recovers to 28.5%).
+But 28.5% < DFA 31.2% → continuous trainability adds ~2.7% additional value.
+
+**8B: Alpha Sweep**
+
+| Method | α=0.25 | α=0.50 | α=0.75 | α=0.90 |
+|--------|--------|--------|--------|--------|
+| perlayer_vector | +0.0% | +0.6% | **+1.1%** | -1.4% |
+| random_trainable | +0.1% | +0.4% | **+0.8%** | -0.1% |
+
+Both methods optimal at α=0.75. perlayer_vector (+1.1%) ≈ random_trainable (+0.8%).
+Per-layer vector is the minimal sufficient scaffold.
+
### Experiment IDs (Phase 10)
- `prefit_threshold/`: Phase 10A prefit threshold curve
- `blend_dissection/`: Phase 10A.5 blend mechanism dissection
- `structured_aux/`: Phase 10A.6 structured vs semantic auxiliary
- `minimal_aux_compression/`: Phase 10A.7 minimal aux compression
+- `freeze_with_decay/`: Phase 10A.8A freeze with decay
+- `alpha_sweep_scaffold/`: Phase 10A.8B alpha sweep