diff options
| author | YurenHao0426 <Blackhao0426@gmail.com> | 2026-04-08 01:33:00 -0500 |
|---|---|---|
| committer | YurenHao0426 <Blackhao0426@gmail.com> | 2026-04-08 01:33:00 -0500 |
| commit | 2ca87f2bd4449b1d4ac715d8cf4fb5f20b7afdd8 (patch) | |
| tree | 34adf68554034c97e344f356d305e91df03c43f0 /protocol | |
| parent | 02252d942dbf449276059c49260ec0994c4f9a5d (diff) | |
PAPER_OUTLINE: §4 rewrite under 'two distinct failure modes' framing
After the round 19 disambiguation experiment confirmed hypothesis B
(penalty CREATES deep alignment, not just reveals it), the paper §4
needs to use the new framing:
Mode 1: measurement degeneracy via terminal LN gradient cancellation
Mode 2: low intrinsic credit-direction quality of random feedback
Both modes are direct-measured (mode 1 by diagnostic (b), mode 2 by
per-layer cos in the meaningful regime). The penalty partially
alleviates BOTH modes. Neither is fully fixed.
§4 rewrite includes:
- The two modes (4.1)
- Penalty causal validation with 3-seed cos (4.2)
- Disambiguation: vanilla early-epoch cos table proving hypothesis B (4.3)
- Why the residual gap is partial alignment (4.4)
- Why this framing is paper-cleaner than prior ones (4.5)
Walk-back chain extended to 7 entries, with 6 and 7 happening same-day
and converging on the final two-distinct-modes framing.
Diffstat (limited to 'protocol')
| -rw-r--r-- | protocol/PAPER_OUTLINE.md | 87 |
1 files changed, 69 insertions, 18 deletions
diff --git a/protocol/PAPER_OUTLINE.md b/protocol/PAPER_OUTLINE.md index f3b19d6..6d1549e 100644 --- a/protocol/PAPER_OUTLINE.md +++ b/protocol/PAPER_OUTLINE.md @@ -99,34 +99,81 @@ BP never fires on any of 9 BP×architecture×seed conditions. The same method falls into different sub-modes on different seeds. (c) is for interpretation; (b) is the primary detector regardless of sub-mode. -## §4 Two failure modes +## §4 Two distinct failure modes (round 19 final framing) -### 4.1 Mechanism story (the discovery layer) +### 4.1 The two modes -**Failure mode 1: residual-stream-amplified BP-grad collapse** +**Mode 1 — measurement degeneracy via terminal LayerNorm gradient cancellation** - DFA's local block losses have no global scale constraint - Block parameters grow ~95× rel-delta on ResMLP (vs BP ~2.7×) -- w1·w2 product per block ~5×10⁴; block outputs grow 10⁷-10⁸× - Residual stream ‖h_L‖ ~ 10⁸ on ResMLP / ViT (4 OOM) -- Terminal LayerNorm Jacobian rescaling drives ‖g_L‖ to ~10⁻¹⁰ (5 OOM below BP) -- Γ becomes a measurement of cosine to a numerical-floor reference vector +- Terminal LayerNorm Jacobian rescaling drives ‖g_L‖ to ~10⁻¹⁰ (below the F.cosine_similarity eps clamp and well below the 10⁻⁷ floor) +- The cosine alignment metric Γ is then computed against a numerical-floor reference vector — the value is mathematically defined but uninterpretable -This is what diagnostics (a) and (b) detect. +**This is caught by diagnostic (b)**: ‖g_L‖ floor check. -**Failure mode 2: depth utilization** -- Even after the scale pathology is corrected, the deep blocks may not contribute over a random-untrained-blocks baseline -- Diagnostic (d) detects this via the frozen-blocks comparison -- **[Round 18 caveat]** The (d) verdict on penalized DFA depends on the intervention strength (λ): at λ=1e-2 the margin is +1.4 ± 0.05 pp; at λ=1e-3 it is +2.3 pp. There is a **real tradeoff between penalty strength and depth utilization** — stronger penalty contains scale better but kills more depth contribution; weaker penalty preserves depth but keeps more scale pathology. +**Mode 2 — low intrinsic credit-direction quality of random feedback** +- Even when the BP gradient at hidden layers is in the meaningful regime (vanilla DFA at epoch 1, ‖g_L‖ ≈ 6×10⁻⁷), the deep-layer cosine of `e_T B_l^T` to BP grad is **essentially zero** (l1-l4 in [-0.05, +0.02] across vanilla ep 1-5) +- This is not measurement noise: the same checkpoint shows l0 cos = +0.42, well above noise. The deep blocks specifically have zero alignment. +- Random feedback projects the error signal in directions that are largely uncorrelated with the per-layer BP gradient, on the deep blocks of pre-LayerNorm residual networks -### 4.2 Causal validation: penalty rescue +**This is caught by direct per-layer cosine measurement** (in the meaningful regime). -On 4-block d=256 ResMLP, adding `λ ‖f_l(h_l)‖²` to each DFA local block loss: -- λ=1e-2 (3 seeds): ‖h_L‖ 4×10⁸ → 4×10⁴ (4 OOM rescue), ‖g_L‖ 5×10⁻¹⁰ → ~10⁻⁶ (4 OOM rescue), acc 0.308 → 0.363 (+5.5 pp over vanilla, +1.4 pp over shallow) -- λ=1e-3: similar magnitude rescue, acc 0.372 (+2.3 pp over shallow, single seed; multi-seed verification in progress) +### 4.2 Causal validation: penalty rescue partially alleviates BOTH modes -**Round 18 framing**: this **partially dissociates the two putative failure modes by intervention**. The penalty alleviates the scale-related diagnostics (a) and (b) but does not bring depth contribution in line with BP. The two failure modes expose **distinct intervention surfaces**. +On 4-block d=256 ResMLP, adding `λ ‖f_l(h_l)‖²` to each DFA local block loss (3 seeds at λ=1e-2, 30 epochs): -Full mechanistic separability requires direct deep-block credit-quality measurement on the penalized checkpoint (in progress). +**Mode 1 alleviation** (residual stream + BP grad): +- ‖h_L‖: 4×10⁸ → ~4×10⁴ (4 OOM rescue, 3 seeds) +- ‖g_L‖: 5×10⁻¹⁰ → ~10⁻⁶ (4 OOM rescue, 3 seeds) +- Diagnostic (b) passes after the penalty + +**Mode 2 alleviation** (deep credit alignment): +- Vanilla deep-layer cos (l1-l4): essentially zero +- Penalized deep-layer cos (l1-l4) 3-seed mean: **+0.155 ± 0.025** +- Null calibration with 20 fresh random Bs: deep cos = +0.002 ± 0.022 — confirms the +0.155 is real signal that the network adapted to its specific training Bs +- Diagnostic measurement of mode 2 is now in a partially-alleviated regime, but +0.155 is still much less than BP's self-cos of 1.0 + +**Both modes are partially alleviated, neither fully**: +- Penalty acc: 0.363 ± 0.0007 (3 seeds, lam=1e-2) +- DFA-vanilla acc: 0.308 ± 0.014 (3 seeds) +- DFA-shallow baseline: 0.349 ± 0.002 +- BP-trainable acc: 0.609 ± 0.004 + +Penalty rescue is +5.5 pp over vanilla (mode 1 alleviated) and +1.4 pp over shallow (mode 2 partially alleviated). The remaining 24 pp gap to BP reflects that mode 2 is only partially fixed: cos +0.155 is real but well below BP's 1.0. + +### 4.3 Disambiguating "penalty revealed" vs "penalty created" the alignment + +Round 19 disambiguation experiment: trained vanilla DFA s42 for 5 epochs and saved checkpoints at each. Measured deep-layer cos at every checkpoint: + +| epoch | ‖g_L‖ | meaningful? | l1 cos | l2 cos | l3 cos | l4 cos | +|---:|---:|:---:|---:|---:|---:|---:| +| 1 | 1.4e-6 | yes | +0.005 | -0.028 | -0.039 | -0.038 | +| 2 | 3.2e-7 | yes | -0.002 | -0.040 | -0.055 | -0.054 | +| 3 | 1.3e-7 | borderline | +0.007 | -0.039 | -0.054 | -0.054 | +| 4 | 6.8e-8 | no | +0.013 | -0.034 | -0.052 | -0.052 | +| 5 | 4.3e-8 | no | +0.016 | -0.036 | -0.055 | -0.055 | + +**Even at epoch 1, where ‖g_L‖ is well above the floor (in the meaningful measurement regime), the deep cosines are essentially zero.** Compare to penalized DFA at ep 30: deep cos ~+0.17. + +This means the penalty intervention **created** the +0.17 alignment, it did not just make a previously-hidden alignment measurable. The mechanism is plausibly: with no penalty, the inner product `<f_l, e_T B_l^T>` can be increased indefinitely by inflating ‖f_l‖, so the optimizer pushes parameters in directions uncorrelated with BP grad. With the penalty, ‖f_l‖ is constrained, so the optimizer must instead orient the direction of `f_l` more carefully, which incidentally yields better (partial) alignment with BP grad. + +This is the strongest causal evidence we have: **the two modes are mechanistically distinct, and the penalty's role is not just numerical (preventing collapse) but training-trajectory-altering (creating partial alignment that wasn't there in vanilla)**. + +### 4.4 Partial alleviation explains the residual gap + +The remaining 24 pp gap from penalized DFA (0.36) to BP-trainable (0.61) is explained by **partial credit alignment**: deep cos +0.17 vs BP's 1.0. A network trained with credit signals that are 17% aligned with the true gradient gets ~60% of BP's accuracy. The relationship between cos quality and accuracy is plausibly monotonic but not necessarily linear. + +The (d) diagnostic margin (penalty +1.4 pp over shallow) is consistent with this picture: the deep blocks contribute *some* useful signal (because cos > 0), but the magnitude of the contribution is small. + +### 4.5 Why this framing is paper-cleaner + +The new framing has several improvements over the original "scale + direction" claim: +- **Empirically grounded**: both modes are directly measured (not inferred from observable proxies) +- **Honest about measurement**: mode 2 is only measurable in the meaningful regime (i.e., after mode 1 is alleviated), and we say so explicitly +- **Causal control**: the vanilla early-epoch checkpoint sweep disambiguates "penalty revealed" vs "penalty created" +- **Null calibration**: fresh-Bs control rules out measurement artifacts +- **Avoids the "two failure modes via (d)" claim**: (d) is now reframed as a depth-utilization measure, not a credit-quality test ## §5 Pipeline pitfalls catalog (appendix) @@ -152,9 +199,13 @@ We walked back our own claims multiple times during this work. Reporting these e 1. "DFA trains ViT-Mini to 24% accuracy" → walked back to "DFA-frozen-random-blocks ViT also gets 25.7%; the deep blocks are passengers" (codex round 6) 2. "DFA trains ResMLP to 31%" → walked back to "DFA-trainable is 4 pp BELOW DFA-shallow on ResMLP; DFA training the deep blocks actively destroys value" (codex round 8) -3. "Penalty rescues to 36.3% above shallow baseline → second failure mode established by (d)" → softened to "the (d) verdict on penalized DFA depends on the intervention strength; the two failure modes are partially dissociated by intervention" (codex round 18) +3. "Penalty rescues to 36.3% above shallow baseline → second failure mode established by (d)" → softened to "the (d) verdict on penalized DFA depends on the intervention strength" (codex round 18) 4. "Layer 0 always dominates Γ" → softened to "aggregation hides per-layer structure that depends on the architecture; on ResMLP layer 0 dominates, on ViT-Mini all layers are uniformly near zero" (round 18 follow-up) 5. "(b) is causally specific to terminal LN" → softened to "(b) appears restricted to terminal-normalized architectures we audited" (round 18 follow-up) +6. "Two failure modes (scale + direction)" → "one unified failure mode (scale → LN → measurement degeneracy)" (cos +0.17 walk-back, 2026-04-08, before disambiguation experiment) +7. **"One unified failure mode" → "two distinct failure modes (measurement degeneracy + low intrinsic random-feedback alignment quality)"** (round 19 disambiguation, 2026-04-08, after vanilla early-epoch checkpoint sweep showed deep cos ~0 even in meaningful regime) + +The walk-back chain converges on a more honest framing each time. Walk-backs 6 and 7 happened in the same day; both are documented in memory and reflected here. The final two-distinct-modes framing is empirically grounded with direct measurement, null calibration, and a causal disambiguation control. ## Open experimental questions |
