summaryrefslogtreecommitdiff
path: root/NOTE.md
diff options
context:
space:
mode:
authorYurenHao0426 <Blackhao0426@gmail.com>2026-03-27 14:25:00 -0500
committerYurenHao0426 <Blackhao0426@gmail.com>2026-03-27 14:25:00 -0500
commit65d97ad1ef4b552103420e6501655df192c98d57 (patch)
treeb3638d4fd4c8eb0f61e57d44dd41d553d33e6d85 /NOTE.md
parentb4e3cbeae6cb4cf4a4b69b84a475afcd7d7e9dbe (diff)
Add Phase 10A.7: minimal aux compression — continuous trainability is essential
8-branch dissection: - zero_target + normmatched both crash: non-zero direction necessary, not norm - perlayer_vector: +0.7% (per-block trainable vector works, network not required) - freeze_after_{1,5,10}: ALL crash to ~13-14% (continuous trainability essential) - random_trainable: +1.0% (reference) Minimal mechanism: continuously trainable, non-zero, depth-aware auxiliary perturbation. Freezing at ANY point destroys the benefit entirely. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Diffstat (limited to 'NOTE.md')
-rw-r--r--NOTE.md20
1 files changed, 20 insertions, 0 deletions
diff --git a/NOTE.md b/NOTE.md
index 0e8de0a..2882547 100644
--- a/NOTE.md
+++ b/NOTE.md
@@ -604,7 +604,27 @@ Phase 9A's +1.5% was not evidence of useful credit — it was an optimization dy
Not semantic credit. Not pure trainability (zero_target crashes). Not state-dependent (time_only works).
Depth-awareness is the minimal requirement (constant_input fails).
+### Phase 10A.7: Minimal Auxiliary Compression
+
+| Branch | final | diff | Key insight |
+|--------|-------|------|-------------|
+| random_trainable | 0.321 | +1.0% | reference |
+| zero_target | 0.203 | -10.8% | must output non-zero (confirmed) |
+| zero_target_normmatched | 0.202 | -10.9% | norm matching doesn't save it |
+| **perlayer_vector** | **0.318** | **+0.7%** | per-block trainable vector works! |
+| freeze_after_1 | 0.144 | -16.7% | freeze = crash |
+| freeze_after_5 | 0.143 | -16.8% | freeze = crash |
+| freeze_after_10 | 0.130 | -18.1% | freeze = crash |
+
+**Key findings**:
+1. Norm-matched zero-target still crashes → non-zero direction is necessary, not just norm
+2. Per-layer trainable vector works (+0.7%) → network not strictly needed, but helps
+3. ALL freeze-after-k crash → **continuous trainability is absolutely necessary**
+
+**Minimal mechanism**: continuously trainable, non-zero, depth-aware auxiliary perturbation.
+
### Experiment IDs (Phase 10)
- `prefit_threshold/`: Phase 10A prefit threshold curve
- `blend_dissection/`: Phase 10A.5 blend mechanism dissection
- `structured_aux/`: Phase 10A.6 structured vs semantic auxiliary
+- `minimal_aux_compression/`: Phase 10A.7 minimal aux compression