summaryrefslogtreecommitdiff
path: root/protocol
diff options
context:
space:
mode:
authorYurenHao0426 <Blackhao0426@gmail.com>2026-04-08 02:02:37 -0500
committerYurenHao0426 <Blackhao0426@gmail.com>2026-04-08 02:02:37 -0500
commit78bd7ad68c174362e944c2b598beb859c2952c0b (patch)
tree64593a228ef435551bea79225cfcf04d2ccbed46 /protocol
parent55deb9a7d500a30557d901be09848fa430a32d80 (diff)
PAPER_OUTLINE: round 20 language tightening + 5 validation summary
§4 updates per round 20: - Soften 'confirmed' to 'strongly supports' - Add §4.4 BP+penalty capacity-cost control with the round 20 phrasing: 'lower bound on residual gap under matched architecture/data/optimizer/ penalty, after accounting for the penalty's direct capacity cost in BP' - Add multi-seed lock-in to §4.3 (24 measurements all near zero) - List 5 independent validations supporting the converged framing The §4 narrative is now complete and the framing is locked.
Diffstat (limited to 'protocol')
-rw-r--r--protocol/PAPER_OUTLINE.md35
1 files changed, 30 insertions, 5 deletions
diff --git a/protocol/PAPER_OUTLINE.md b/protocol/PAPER_OUTLINE.md
index 6d1549e..3cc6b57 100644
--- a/protocol/PAPER_OUTLINE.md
+++ b/protocol/PAPER_OUTLINE.md
@@ -158,23 +158,48 @@ Round 19 disambiguation experiment: trained vanilla DFA s42 for 5 epochs and sav
This means the penalty intervention **created** the +0.17 alignment, it did not just make a previously-hidden alignment measurable. The mechanism is plausibly: with no penalty, the inner product `<f_l, e_T B_l^T>` can be increased indefinitely by inflating ‖f_l‖, so the optimizer pushes parameters in directions uncorrelated with BP grad. With the penalty, ‖f_l‖ is constrained, so the optimizer must instead orient the direction of `f_l` more carefully, which incidentally yields better (partial) alignment with BP grad.
-This is the strongest causal evidence we have: **the two modes are mechanistically distinct, and the penalty's role is not just numerical (preventing collapse) but training-trajectory-altering (creating partial alignment that wasn't there in vanilla)**.
+This is the strongest causal evidence we have: **the two modes are mechanistically distinct (round 20 wording: "strongly supports" rather than "confirmed"), and the penalty's role is not just numerical (preventing collapse) but training-trajectory-altering (creating partial alignment that wasn't there in vanilla)**. Multi-seed lock-in (3 seeds × {ep 1, ep 2}, 24 measurements total) gives deep-layer cosines all in [-0.04, +0.02], 3-seed mean -0.008 ± 0.013 at ep 1 — closing the single-seed-fluke objection.
-### 4.4 Partial alleviation explains the residual gap
+### 4.4 Capacity-cost control: BP+penalty 2×2
-The remaining 24 pp gap from penalized DFA (0.36) to BP-trainable (0.61) is explained by **partial credit alignment**: deep cos +0.17 vs BP's 1.0. A network trained with credit signals that are 17% aligned with the true gradient gets ~60% of BP's accuracy. The relationship between cos quality and accuracy is plausibly monotonic but not necessarily linear.
+To distinguish the residual depth-utilization gap from "the penalty's intrinsic capacity-regularization cost", we ran end-to-end BP with the same `λ ‖f_l(h_l)‖²` penalty for 30 epochs:
-The (d) diagnostic margin (penalty +1.4 pp over shallow) is consistent with this picture: the deep blocks contribute *some* useful signal (because cos > 0), but the magnitude of the contribution is small.
+| | no penalty | with penalty |
+|---|---:|---:|
+| BP | 0.609 | **0.530** |
+| DFA | 0.308 | 0.363 |
-### 4.5 Why this framing is paper-cleaner
+(All same architecture, same data, same optimizer family.)
+
+The penalty has **opposite effects** on BP and DFA: **−8 pp** capacity cost on BP, **+5.5 pp** rescue on DFA. BP+penalty still clears the DFA-shallow baseline by **+18.1 pp**, while DFA+penalty clears it by only +1.4 pp.
+
+**[Round 20 phrasing]**: this is *not* a clean isolation of "credit quality" in a vacuum — it identifies a **lower bound on the residual performance gap under matched architecture, data, optimizer family, and matched penalty, after accounting for the penalty's direct capacity cost in BP**. Stated more cautiously: *"matched penalty controls show that only part of DFA's deficit is attributable to the representational/optimization cost of the penalty itself; a substantial residual remains and is consistent with poorer credit assignment"*.
+
+A counterargument would be that the penalty places BP into a fundamentally better optimization regime unrelated to capacity — but this is unlikely because the penalty *hurts* BP by 8 pp while *helping* DFA by 5.5 pp, the opposite pattern expected from a generally beneficial regime shift.
+
+### 4.5 Partial alleviation explains the residual gap
+
+The remaining 24 pp gap from penalized DFA (0.36) to BP-trainable (0.61) is dominantly explained by the partial credit-quality cost identified above (~17 pp of the ~24 pp residual). The (d) diagnostic margin (penalty +1.4 pp over shallow) is consistent: the deep blocks contribute *some* useful signal (because cos > 0), but the magnitude is small.
+
+### 4.6 Why this framing is paper-cleaner
The new framing has several improvements over the original "scale + direction" claim:
- **Empirically grounded**: both modes are directly measured (not inferred from observable proxies)
- **Honest about measurement**: mode 2 is only measurable in the meaningful regime (i.e., after mode 1 is alleviated), and we say so explicitly
- **Causal control**: the vanilla early-epoch checkpoint sweep disambiguates "penalty revealed" vs "penalty created"
- **Null calibration**: fresh-Bs control rules out measurement artifacts
+- **Capacity-cost control**: BP+penalty 2×2 disambiguates capacity loss from credit quality
+- **Multi-seed lock-in**: 3 seeds × {ep 1, ep 2} vanilla cosine closes the single-seed-fluke objection
- **Avoids the "two failure modes via (d)" claim**: (d) is now reframed as a depth-utilization measure, not a credit-quality test
+### Five independent validations supporting the framing
+
+1. Direct deep-layer cosine measurement on penalized DFA: 3-seed mean +0.155 ± 0.025
+2. Null calibration with 20 fresh random Bs: deep cos +0.002 ± 0.022 — confirms training-Bs +0.16 is real
+3. Hypothesis-B disambiguation (vanilla early-epoch ep 1, ‖g‖ in meaningful regime): deep cos -0.008 ± 0.013 across 3 seeds — confirms penalty creates (not just reveals) the alignment
+4. BP+penalty capacity-cost control: penalty has only -8 pp BP cost; the 17 pp residual gap is consistent with credit-quality cost
+5. Multi-seed lock-in: 24 measurements (3 seeds × 2 ep × 4 deep layers) all in [-0.04, +0.02] — single-seed-fluke objection closed
+
## §5 Pipeline pitfalls catalog (appendix)
7 evaluation-pipeline bugs we found in our own dogfood codebase. Each has a reproducer in `protocol/examples/verify_pitfalls*.py`.