diff options
| author | YurenHao0426 <Blackhao0426@gmail.com> | 2026-03-27 14:25:00 -0500 |
|---|---|---|
| committer | YurenHao0426 <Blackhao0426@gmail.com> | 2026-03-27 14:25:00 -0500 |
| commit | 65d97ad1ef4b552103420e6501655df192c98d57 (patch) | |
| tree | b3638d4fd4c8eb0f61e57d44dd41d553d33e6d85 /NOTE.md | |
| parent | b4e3cbeae6cb4cf4a4b69b84a475afcd7d7e9dbe (diff) | |
Add Phase 10A.7: minimal aux compression — continuous trainability is essential
8-branch dissection:
- zero_target + normmatched both crash: non-zero direction necessary, not norm
- perlayer_vector: +0.7% (per-block trainable vector works, network not required)
- freeze_after_{1,5,10}: ALL crash to ~13-14% (continuous trainability essential)
- random_trainable: +1.0% (reference)
Minimal mechanism: continuously trainable, non-zero, depth-aware auxiliary perturbation.
Freezing at ANY point destroys the benefit entirely.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Diffstat (limited to 'NOTE.md')
| -rw-r--r-- | NOTE.md | 20 |
1 files changed, 20 insertions, 0 deletions
@@ -604,7 +604,27 @@ Phase 9A's +1.5% was not evidence of useful credit — it was an optimization dy Not semantic credit. Not pure trainability (zero_target crashes). Not state-dependent (time_only works). Depth-awareness is the minimal requirement (constant_input fails). +### Phase 10A.7: Minimal Auxiliary Compression + +| Branch | final | diff | Key insight | +|--------|-------|------|-------------| +| random_trainable | 0.321 | +1.0% | reference | +| zero_target | 0.203 | -10.8% | must output non-zero (confirmed) | +| zero_target_normmatched | 0.202 | -10.9% | norm matching doesn't save it | +| **perlayer_vector** | **0.318** | **+0.7%** | per-block trainable vector works! | +| freeze_after_1 | 0.144 | -16.7% | freeze = crash | +| freeze_after_5 | 0.143 | -16.8% | freeze = crash | +| freeze_after_10 | 0.130 | -18.1% | freeze = crash | + +**Key findings**: +1. Norm-matched zero-target still crashes → non-zero direction is necessary, not just norm +2. Per-layer trainable vector works (+0.7%) → network not strictly needed, but helps +3. ALL freeze-after-k crash → **continuous trainability is absolutely necessary** + +**Minimal mechanism**: continuously trainable, non-zero, depth-aware auxiliary perturbation. + ### Experiment IDs (Phase 10) - `prefit_threshold/`: Phase 10A prefit threshold curve - `blend_dissection/`: Phase 10A.5 blend mechanism dissection - `structured_aux/`: Phase 10A.6 structured vs semantic auxiliary +- `minimal_aux_compression/`: Phase 10A.7 minimal aux compression |
