PAPER_OUTLINE: §4 rewrite under 'two distinct failure modes' framing

After the round 19 disambiguation experiment confirmed hypothesis B (penalty CREATES deep alignment, not just reveals it), the paper §4 needs to use the new framing: Mode 1: measurement degeneracy via terminal LN gradient cancellation Mode 2: low intrinsic credit-direction quality of random feedback Both modes are direct-measured (mode 1 by diagnostic (b), mode 2 by per-layer cos in the meaningful regime). The penalty partially alleviates BOTH modes. Neither is fully fixed. §4 rewrite includes: - The two modes (4.1) - Penalty causal validation with 3-seed cos (4.2) - Disambiguation: vanilla early-epoch cos table proving hypothesis B (4.3) - Why the residual gap is partial alignment (4.4) - Why this framing is paper-cleaner than prior ones (4.5) Walk-back chain extended to 7 entries, with 6 and 7 happening same-day and converging on the final two-distinct-modes framing.
author: YurenHao0426 <Blackhao0426@gmail.com> 2026-04-08 01:33:00 -0500
committer: YurenHao0426 <Blackhao0426@gmail.com> 2026-04-08 01:33:00 -0500
commit: 2ca87f2bd4449b1d4ac715d8cf4fb5f20b7afdd8 (patch)
tree: 34adf68554034c97e344f356d305e91df03c43f0 /protocol
parent: 02252d942dbf449276059c49260ec0994c4f9a5d (diff)
1 files changed, 69 insertions, 18 deletions
diff --git a/protocol/PAPER_OUTLINE.md b/protocol/PAPER_OUTLINE.md
index f3b19d6..6d1549e 100644
--- a/protocol/PAPER_OUTLINE.md
+++ b/protocol/PAPER_OUTLINE.md
@@ -99,34 +99,81 @@ BP never fires on any of 9 BP×architecture×seed conditions.
 
 The same method falls into different sub-modes on different seeds. (c) is for interpretation; (b) is the primary detector regardless of sub-mode.
 
-## §4 Two failure modes
+## §4 Two distinct failure modes (round 19 final framing)
 
-### 4.1 Mechanism story (the discovery layer)
+### 4.1 The two modes
 
-**Failure mode 1: residual-stream-amplified BP-grad collapse**
+**Mode 1 — measurement degeneracy via terminal LayerNorm gradient cancellation**
 - DFA's local block losses have no global scale constraint
 - Block parameters grow ~95× rel-delta on ResMLP (vs BP ~2.7×)
-- w1·w2 product per block ~5×10⁴; block outputs grow 10⁷-10⁸×
 - Residual stream ‖h_L‖ ~ 10⁸ on ResMLP / ViT (4 OOM)
-- Terminal LayerNorm Jacobian rescaling drives ‖g_L‖ to ~10⁻¹⁰ (5 OOM below BP)
-- Γ becomes a measurement of cosine to a numerical-floor reference vector
+- Terminal LayerNorm Jacobian rescaling drives ‖g_L‖ to ~10⁻¹⁰ (below the F.cosine_similarity eps clamp and well below the 10⁻⁷ floor)
+- The cosine alignment metric Γ is then computed against a numerical-floor reference vector — the value is mathematically defined but uninterpretable
 
-This is what diagnostics (a) and (b) detect.
+**This is caught by diagnostic (b)**: ‖g_L‖ floor check.
 
-**Failure mode 2: depth utilization**
-- Even after the scale pathology is corrected, the deep blocks may not contribute over a random-untrained-blocks baseline
-- Diagnostic (d) detects this via the frozen-blocks comparison
-- **[Round 18 caveat]** The (d) verdict on penalized DFA depends on the intervention strength (λ): at λ=1e-2 the margin is +1.4 ± 0.05 pp; at λ=1e-3 it is +2.3 pp. There is a **real tradeoff between penalty strength and depth utilization** — stronger penalty contains scale better but kills more depth contribution; weaker penalty preserves depth but keeps more scale pathology.
+**Mode 2 — low intrinsic credit-direction quality of random feedback**
+- Even when the BP gradient at hidden layers is in the meaningful regime (vanilla DFA at epoch 1, ‖g_L‖ ≈ 6×10⁻⁷), the deep-layer cosine of `e_T B_l^T` to BP grad is **essentially zero** (l1-l4 in [-0.05, +0.02] across vanilla ep 1-5)
+- This is not measurement noise: the same checkpoint shows l0 cos = +0.42, well above noise. The deep blocks specifically have zero alignment.
+- Random feedback projects the error signal in directions that are largely uncorrelated with the per-layer BP gradient, on the deep blocks of pre-LayerNorm residual networks
 
-### 4.2 Causal validation: penalty rescue
+**This is caught by direct per-layer cosine measurement** (in the meaningful regime).
 
-On 4-block d=256 ResMLP, adding `λ ‖f_l(h_l)‖²` to each DFA local block loss:
-- λ=1e-2 (3 seeds): ‖h_L‖ 4×10⁸ → 4×10⁴ (4 OOM rescue), ‖g_L‖ 5×10⁻¹⁰ → ~10⁻⁶ (4 OOM rescue), acc 0.308 → 0.363 (+5.5 pp over vanilla, +1.4 pp over shallow)
-- λ=1e-3: similar magnitude rescue, acc 0.372 (+2.3 pp over shallow, single seed; multi-seed verification in progress)
+### 4.2 Causal validation: penalty rescue partially alleviates BOTH modes
 
-**Round 18 framing**: this **partially dissociates the two putative failure modes by intervention**. The penalty alleviates the scale-related diagnostics (a) and (b) but does not bring depth contribution in line with BP. The two failure modes expose **distinct intervention surfaces**.
+On 4-block d=256 ResMLP, adding `λ ‖f_l(h_l)‖²` to each DFA local block loss (3 seeds at λ=1e-2, 30 epochs):
 
-Full mechanistic separability requires direct deep-block credit-quality measurement on the penalized checkpoint (in progress).
+**Mode 1 alleviation** (residual stream + BP grad):
+- ‖h_L‖: 4×10⁸ → ~4×10⁴ (4 OOM rescue, 3 seeds)
+- ‖g_L‖: 5×10⁻¹⁰ → ~10⁻⁶ (4 OOM rescue, 3 seeds)
+- Diagnostic (b) passes after the penalty
+
+**Mode 2 alleviation** (deep credit alignment):
+- Vanilla deep-layer cos (l1-l4): essentially zero
+- Penalized deep-layer cos (l1-l4) 3-seed mean: **+0.155 ± 0.025**
+- Null calibration with 20 fresh random Bs: deep cos = +0.002 ± 0.022 — confirms the +0.155 is real signal that the network adapted to its specific training Bs
+- Diagnostic measurement of mode 2 is now in a partially-alleviated regime, but +0.155 is still much less than BP's self-cos of 1.0
+
+**Both modes are partially alleviated, neither fully**:
+- Penalty acc: 0.363 ± 0.0007 (3 seeds, lam=1e-2)
+- DFA-vanilla acc: 0.308 ± 0.014 (3 seeds)
+- DFA-shallow baseline: 0.349 ± 0.002
+- BP-trainable acc: 0.609 ± 0.004
+
+Penalty rescue is +5.5 pp over vanilla (mode 1 alleviated) and +1.4 pp over shallow (mode 2 partially alleviated). The remaining 24 pp gap to BP reflects that mode 2 is only partially fixed: cos +0.155 is real but well below BP's 1.0.
+
+### 4.3 Disambiguating "penalty revealed" vs "penalty created" the alignment
+
+Round 19 disambiguation experiment: trained vanilla DFA s42 for 5 epochs and saved checkpoints at each. Measured deep-layer cos at every checkpoint:
+
+| epoch | ‖g_L‖ | meaningful? | l1 cos | l2 cos | l3 cos | l4 cos |
+|---:|---:|:---:|---:|---:|---:|---:|
+| 1 | 1.4e-6 | yes | +0.005 | -0.028 | -0.039 | -0.038 |
+| 2 | 3.2e-7 | yes | -0.002 | -0.040 | -0.055 | -0.054 |
+| 3 | 1.3e-7 | borderline | +0.007 | -0.039 | -0.054 | -0.054 |
+| 4 | 6.8e-8 | no | +0.013 | -0.034 | -0.052 | -0.052 |
+| 5 | 4.3e-8 | no | +0.016 | -0.036 | -0.055 | -0.055 |
+
+**Even at epoch 1, where ‖g_L‖ is well above the floor (in the meaningful measurement regime), the deep cosines are essentially zero.** Compare to penalized DFA at ep 30: deep cos ~+0.17.
+
+This means the penalty intervention **created** the +0.17 alignment, it did not just make a previously-hidden alignment measurable. The mechanism is plausibly: with no penalty, the inner product `<f_l, e_T B_l^T>` can be increased indefinitely by inflating ‖f_l‖, so the optimizer pushes parameters in directions uncorrelated with BP grad. With the penalty, ‖f_l‖ is constrained, so the optimizer must instead orient the direction of `f_l` more carefully, which incidentally yields better (partial) alignment with BP grad.
+
+This is the strongest causal evidence we have: **the two modes are mechanistically distinct, and the penalty's role is not just numerical (preventing collapse) but training-trajectory-altering (creating partial alignment that wasn't there in vanilla)**.
+
+### 4.4 Partial alleviation explains the residual gap
+
+The remaining 24 pp gap from penalized DFA (0.36) to BP-trainable (0.61) is explained by **partial credit alignment**: deep cos +0.17 vs BP's 1.0. A network trained with credit signals that are 17% aligned with the true gradient gets ~60% of BP's accuracy. The relationship between cos quality and accuracy is plausibly monotonic but not necessarily linear.
+
+The (d) diagnostic margin (penalty +1.4 pp over shallow) is consistent with this picture: the deep blocks contribute *some* useful signal (because cos > 0), but the magnitude of the contribution is small.
+
+### 4.5 Why this framing is paper-cleaner
+
+The new framing has several improvements over the original "scale + direction" claim:
+- **Empirically grounded**: both modes are directly measured (not inferred from observable proxies)
+- **Honest about measurement**: mode 2 is only measurable in the meaningful regime (i.e., after mode 1 is alleviated), and we say so explicitly
+- **Causal control**: the vanilla early-epoch checkpoint sweep disambiguates "penalty revealed" vs "penalty created"
+- **Null calibration**: fresh-Bs control rules out measurement artifacts
+- **Avoids the "two failure modes via (d)" claim**: (d) is now reframed as a depth-utilization measure, not a credit-quality test
 
 ## §5 Pipeline pitfalls catalog (appendix)
 
@@ -152,9 +199,13 @@ We walked back our own claims multiple times during this work. Reporting these e
 
 1. "DFA trains ViT-Mini to 24% accuracy" → walked back to "DFA-frozen-random-blocks ViT also gets 25.7%; the deep blocks are passengers" (codex round 6)
 2. "DFA trains ResMLP to 31%" → walked back to "DFA-trainable is 4 pp BELOW DFA-shallow on ResMLP; DFA training the deep blocks actively destroys value" (codex round 8)
-3. "Penalty rescues to 36.3% above shallow baseline → second failure mode established by (d)" → softened to "the (d) verdict on penalized DFA depends on the intervention strength; the two failure modes are partially dissociated by intervention" (codex round 18)
+3. "Penalty rescues to 36.3% above shallow baseline → second failure mode established by (d)" → softened to "the (d) verdict on penalized DFA depends on the intervention strength" (codex round 18)
 4. "Layer 0 always dominates Γ" → softened to "aggregation hides per-layer structure that depends on the architecture; on ResMLP layer 0 dominates, on ViT-Mini all layers are uniformly near zero" (round 18 follow-up)
 5. "(b) is causally specific to terminal LN" → softened to "(b) appears restricted to terminal-normalized architectures we audited" (round 18 follow-up)
+6. "Two failure modes (scale + direction)" → "one unified failure mode (scale → LN → measurement degeneracy)" (cos +0.17 walk-back, 2026-04-08, before disambiguation experiment)
+7. **"One unified failure mode" → "two distinct failure modes (measurement degeneracy + low intrinsic random-feedback alignment quality)"** (round 19 disambiguation, 2026-04-08, after vanilla early-epoch checkpoint sweep showed deep cos ~0 even in meaningful regime)
+
+The walk-back chain converges on a more honest framing each time. Walk-backs 6 and 7 happened in the same day; both are documented in memory and reflected here. The final two-distinct-modes framing is empirically grounded with direct measurement, null calibration, and a causal disambiguation control.
 
 ## Open experimental questions
author	YurenHao0426 <Blackhao0426@gmail.com>	2026-04-08 01:33:00 -0500
committer	YurenHao0426 <Blackhao0426@gmail.com>	2026-04-08 01:33:00 -0500
commit	2ca87f2bd4449b1d4ac715d8cf4fb5f20b7afdd8 (patch)
tree	34adf68554034c97e344f356d305e91df03c43f0 /protocol
parent	02252d942dbf449276059c49260ec0994c4f9a5d (diff)