diff options
Diffstat (limited to 'NOTE.md')
| -rw-r--r-- | NOTE.md | 20 |
1 files changed, 19 insertions, 1 deletions
@@ -5,7 +5,7 @@ - **pilot**: Controlled iteration (commits 0b9ebb2, 7baf7ae) - **frozen**: Code at commit 0b9ebb2 for all reported results -## Status: PHASE 10A.5 — BLEND GAIN IS IMPLICIT REGULARIZATION, NOT LEARNED CREDIT +## Status: PHASE 10A.6 — GAIN REQUIRES TRAINABLE DEPTH-AWARE AUX, NOT SEMANTIC CREDIT --- @@ -587,6 +587,24 @@ Trainable Vec helps even with shuffled targets. Gaussian noise and norm scaling Phase 9A's +1.5% was not evidence of useful credit — it was an optimization dynamics effect. +### Phase 10A.6: Structured vs Semantic Auxiliary + +| Branch | final | diff | Key insight | +|--------|-------|------|-------------| +| random_trainable | 0.324 | +1.2% | works | +| shuffled_trainable | 0.325 | +1.4% | no semantics needed | +| **zero_target** | **0.221** | **-9.1%** | must output non-zero | +| fresh_random_target | 0.325 | +1.3% | stable targets not needed | +| time_only | 0.321 | +1.0% | h_l not needed, just depth | +| **constant_input** | **0.312** | **+0.0%** | needs at least depth info | +| prefit60_frozen | 0.127 | -18.4% | frozen = crash | +| prefit60_trainable | 0.321 | +1.0% | prefit ≈ random init | + +**Mechanism**: depth-aware trainable auxiliary perturbation that diversifies block-local updates. +Not semantic credit. Not pure trainability (zero_target crashes). Not state-dependent (time_only works). +Depth-awareness is the minimal requirement (constant_input fails). + ### Experiment IDs (Phase 10) - `prefit_threshold/`: Phase 10A prefit threshold curve - `blend_dissection/`: Phase 10A.5 blend mechanism dissection +- `structured_aux/`: Phase 10A.6 structured vs semantic auxiliary |
