summaryrefslogtreecommitdiff
path: root/NOTE.md
diff options
context:
space:
mode:
authorYurenHao0426 <Blackhao0426@gmail.com>2026-03-26 22:07:35 -0500
committerYurenHao0426 <Blackhao0426@gmail.com>2026-03-26 22:07:35 -0500
commitb4e3cbeae6cb4cf4a4b69b84a475afcd7d7e9dbe (patch)
treefca5a27504471091eba74a8f7efe2cf48eb85826 /NOTE.md
parent610e1169e19378cccd2d9b92a588c24dca7f3df7 (diff)
Add Phase 10A.6: gain requires trainable depth-aware aux, not semantic credit
9-branch dissection results: - zero_target crashes (-9.1%): aux must output non-zero - constant_input neutral (+0.0%): needs at least depth info - time_only works (+1.0%): h_l not needed, just depth index - shuffled/fresh_random work (+1.3-1.4%): no semantic content needed - prefit60_trainable ≈ random_trainable: prefit adds nothing - All frozen branches crash: trainability is essential Mechanism: depth-aware trainable auxiliary perturbation that diversifies block-local updates. Not semantic credit, not pure trainability. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Diffstat (limited to 'NOTE.md')
-rw-r--r--NOTE.md20
1 files changed, 19 insertions, 1 deletions
diff --git a/NOTE.md b/NOTE.md
index 62deba6..0e8de0a 100644
--- a/NOTE.md
+++ b/NOTE.md
@@ -5,7 +5,7 @@
- **pilot**: Controlled iteration (commits 0b9ebb2, 7baf7ae)
- **frozen**: Code at commit 0b9ebb2 for all reported results
-## Status: PHASE 10A.5 — BLEND GAIN IS IMPLICIT REGULARIZATION, NOT LEARNED CREDIT
+## Status: PHASE 10A.6 — GAIN REQUIRES TRAINABLE DEPTH-AWARE AUX, NOT SEMANTIC CREDIT
---
@@ -587,6 +587,24 @@ Trainable Vec helps even with shuffled targets. Gaussian noise and norm scaling
Phase 9A's +1.5% was not evidence of useful credit — it was an optimization dynamics effect.
+### Phase 10A.6: Structured vs Semantic Auxiliary
+
+| Branch | final | diff | Key insight |
+|--------|-------|------|-------------|
+| random_trainable | 0.324 | +1.2% | works |
+| shuffled_trainable | 0.325 | +1.4% | no semantics needed |
+| **zero_target** | **0.221** | **-9.1%** | must output non-zero |
+| fresh_random_target | 0.325 | +1.3% | stable targets not needed |
+| time_only | 0.321 | +1.0% | h_l not needed, just depth |
+| **constant_input** | **0.312** | **+0.0%** | needs at least depth info |
+| prefit60_frozen | 0.127 | -18.4% | frozen = crash |
+| prefit60_trainable | 0.321 | +1.0% | prefit ≈ random init |
+
+**Mechanism**: depth-aware trainable auxiliary perturbation that diversifies block-local updates.
+Not semantic credit. Not pure trainability (zero_target crashes). Not state-dependent (time_only works).
+Depth-awareness is the minimal requirement (constant_input fails).
+
### Experiment IDs (Phase 10)
- `prefit_threshold/`: Phase 10A prefit threshold curve
- `blend_dissection/`: Phase 10A.5 blend mechanism dissection
+- `structured_aux/`: Phase 10A.6 structured vs semantic auxiliary