summaryrefslogtreecommitdiff
path: root/NOTE.md
diff options
context:
space:
mode:
authorYurenHao0426 <Blackhao0426@gmail.com>2026-03-27 16:39:17 -0500
committerYurenHao0426 <Blackhao0426@gmail.com>2026-03-27 16:39:17 -0500
commit4d6e689fe6bfffef6db7a4650aec210cd3eeed5c (patch)
treefa8b6d123a51bab4b17a07f787cc89e74584397f /NOTE.md
parent65d97ad1ef4b552103420e6501655df192c98d57 (diff)
Add Phase 10A.8: freeze-with-decay confirms stale aux is main freeze failure cause;
alpha sweep shows perlayer_vector at alpha=0.75 matches full network 10A.8A: freeze_decay_to_000 recovers to 28.5% (vs 14.6% fixed freeze) — stale high-weight aux is the primary cause of freeze crashes. But 28.5% < DFA 31.2% confirms continuous trainability adds ~2.7% independent value. 10A.8B: Both perlayer_vector and random_trainable optimal at alpha=0.75. perlayer_vector +1.1% vs random_trainable +0.8% — per-layer vector is the minimal sufficient scaffold, no network needed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Diffstat (limited to 'NOTE.md')
-rw-r--r--NOTE.md26
1 files changed, 26 insertions, 0 deletions
diff --git a/NOTE.md b/NOTE.md
index 2882547..90057af 100644
--- a/NOTE.md
+++ b/NOTE.md
@@ -623,8 +623,34 @@ Depth-awareness is the minimal requirement (constant_input fails).
**Minimal mechanism**: continuously trainable, non-zero, depth-aware auxiliary perturbation.
+### Phase 10A.8: Scaffold Dynamics
+
+**8A: Freeze with Decay**
+
+| Branch | final | diff | Key |
+|--------|-------|------|-----|
+| random_trainable_075 | 0.322 | +1.1% | reference |
+| freeze1_fixed075 | 0.146 | -16.6% | stale aux at 75% kills |
+| freeze1_decay_to_000 | **0.285** | **-2.7%** | decay to DFA recovers most |
+| freeze5_decay_to_000 | 0.285 | -2.6% | same |
+
+Freeze failure is MOSTLY stale high-weight aux (decay_to_000 recovers to 28.5%).
+But 28.5% < DFA 31.2% → continuous trainability adds ~2.7% additional value.
+
+**8B: Alpha Sweep**
+
+| Method | α=0.25 | α=0.50 | α=0.75 | α=0.90 |
+|--------|--------|--------|--------|--------|
+| perlayer_vector | +0.0% | +0.6% | **+1.1%** | -1.4% |
+| random_trainable | +0.1% | +0.4% | **+0.8%** | -0.1% |
+
+Both methods optimal at α=0.75. perlayer_vector (+1.1%) ≈ random_trainable (+0.8%).
+Per-layer vector is the minimal sufficient scaffold.
+
### Experiment IDs (Phase 10)
- `prefit_threshold/`: Phase 10A prefit threshold curve
- `blend_dissection/`: Phase 10A.5 blend mechanism dissection
- `structured_aux/`: Phase 10A.6 structured vs semantic auxiliary
- `minimal_aux_compression/`: Phase 10A.7 minimal aux compression
+- `freeze_with_decay/`: Phase 10A.8A freeze with decay
+- `alpha_sweep_scaffold/`: Phase 10A.8B alpha sweep