summaryrefslogtreecommitdiff
path: root/protocol
diff options
context:
space:
mode:
authorYurenHao0426 <Blackhao0426@gmail.com>2026-04-08 02:04:28 -0500
committerYurenHao0426 <Blackhao0426@gmail.com>2026-04-08 02:04:28 -0500
commit8bf53ab94ac31c7672d23e2edf0e40c787b157d4 (patch)
tree42621f1ae96a98c3a3294436c6e9fd21a4b6e274 /protocol
parent78bd7ad68c174362e944c2b598beb859c2952c0b (diff)
EVIDENCE_SUMMARY: §4 fully rewritten under locked two-distinct-modes framing
§4 now reflects all 5 independent validations of the converged framing: 1. Direct deep cos on penalized DFA (3 seeds): +0.155 ± 0.025 2. Null calibration with fresh Bs: +0.002 ± 0.022 (real signal) 3. Hypothesis B disambiguation (vanilla early ep): -0.008 ± 0.013 4. BP+penalty 2×2 control: 17 pp residual = credit quality 5. Multi-seed lock-in: 24 measurements all near zero Round 20 language tightening applied: - 'lower bound on non-capacity gap' instead of 'clean isolation' - Explicit caveats about end-to-end vs local-loss difference - Counter to 'different optimization regime' objection The §4 framing is locked. Five independent validations done. Stop iterating, start writing.
Diffstat (limited to 'protocol')
-rw-r--r--protocol/EVIDENCE_SUMMARY.md63
1 files changed, 50 insertions, 13 deletions
diff --git a/protocol/EVIDENCE_SUMMARY.md b/protocol/EVIDENCE_SUMMARY.md
index 0da2e75..d1d6d9b 100644
--- a/protocol/EVIDENCE_SUMMARY.md
+++ b/protocol/EVIDENCE_SUMMARY.md
@@ -84,20 +84,57 @@ itself is well-defined.
Reproduce: `python -m protocol.examples.audit_cnn`
-## §4 Two failure modes
+## §4 Two distinct failure modes (LOCKED — round 20)
-| evidence | result | reproduce |
-|---|---|---|
-| Penalty rescue (3 seeds, λ=1e-2) | DFA acc 0.308 → 0.363, ‖h_L‖ 4e8 → 4e4, ‖g_L‖ 5e-10 → 1e-6 | `dfa_residual_penalty_test.py --lam 1e-2` |
-| **Penalty partial protocol audit** | Penalized DFA: (a)+(b) **PASS** (penalty fixes scale), but (d) **STILL FIRES** on 3/3 seeds (margin 1.38 ± 0.05 pp < 2 pp) | `python -m protocol.examples.penalty_partial_audit` |
-| Vanilla DFA per-layer cosine (3 seeds) | layer 0: cos = +0.42 (high), layers 1-4: cos ≈ 0 (range -0.03 to +0.03). Headline +0.07 is entirely from layer 0. | `python experiments/measure_direction_quality_existing_ckpt.py --seed 42` |
-
-The two putative failure modes are **partially dissociated by intervention**
-(round 18 softening): the penalty alleviates the scale-related diagnostics
-(a)+(b) while the frozen-baseline diagnostic (d) still fires. (d) provides
-independent evidence that poor use of depth persists after the scale
-pathology is reduced. Full mechanistic separability requires direct
-deep-block credit measurement on the penalized checkpoint (in progress).
+The §4 framing is locked after rounds 18-20. Two distinct failure modes,
+five independent validations.
+
+### Mode 1: measurement degeneracy via terminal-LayerNorm gradient cancellation
+
+Residual stream growth → BP gradient at hidden layers collapses below the
+1e-7 floor → cosine alignment metric measured against degenerate reference.
+**Caught by diagnostic (b).** Direct empirical evidence is the 5-method
+audit table where DFA/SB/CB all have ‖g_L‖ ~ 1e-9.
+
+### Mode 2: low intrinsic credit-direction quality of random feedback
+
+Even in the meaningful regime (vanilla DFA at ep 1, ‖g‖ ~ 10⁻⁶), DFA's
+local credit signal `e_T B_l^T` is essentially uncorrelated with BP grad
+on deep layers. **Caught by direct per-layer cosine measurement.**
+
+### Five independent validations of the converged framing
+
+| # | evidence | result | reproduce |
+|---|---|---|---|
+| 1 | Direct deep-layer cosine on penalized DFA, 3 seeds | layer-mean +0.186 ± 0.007; deep mean +0.155 ± 0.025 | `experiments/measure_direction_quality_existing_ckpt.py` on `results/dfa_pen_short/dfa_pen_lam0.01_s{42,123,456}.pt` |
+| 2 | Null calibration with 20 fresh random Bs | training-Bs deep cos +0.16 vs fresh-Bs +0.002 ± 0.022 | `experiments/null_calibration_penalized_cos.py` |
+| 3 | Hypothesis B disambiguation (vanilla early-epoch) | vanilla deep cos -0.008 ± 0.013 across 3 seeds × ep 1, even with ‖g‖ in meaningful regime | `experiments/vanilla_dfa_early_ckpt.py` + measure script |
+| 4 | BP+penalty capacity-cost 2×2 control | BP+pen acc 0.530 (-8 pp); DFA+pen 0.363 (+5.5 pp); 17 pp residual gap consistent with credit quality | `experiments/bp_with_penalty_control.py` |
+| 5 | Multi-seed lock-in (round 20) | 24 measurements (3 seeds × 2 epochs × 4 deep layers) all in [-0.04, +0.02] | iterate measure script over s42/s123/s456 × ep1/ep2 |
+
+### Penalty rescue 3-seed table (lam=1e-2)
+
+| seed | acc | ‖h_L‖ | ‖g_2‖ | deep cos l1-l4 mean |
+|---:|---:|---:|---:|---:|
+| 42 | 0.363 | 3.8e4 | 9.9e-7 | +0.163 |
+| 123 | 0.362 | 4.1e4 | 8.1e-7 | +0.151 |
+| 456 | 0.364 | 4.1e4 | 9.0e-7 | +0.139 |
+| **mean** | **0.363 ± 0.001** | **4.0e4** | **9.0e-7** | **+0.151 ± 0.012** |
+
+### BP+penalty 2×2 grid (raw acc, primary number per round 20)
+
+| | no penalty | with penalty | penalty effect |
+|---|---:|---:|---:|
+| BP | 0.609 | **0.530** | −8 pp (capacity loss) |
+| DFA | 0.308 | 0.363 | +5.5 pp (rescue) |
+
+### Round 20 phrasing for the gap
+
+**Lower bound on non-capacity gap**: matched penalty controls show that only part of DFA's deficit is attributable to the representational/optimization cost of the penalty itself; a substantial residual remains and is consistent with poorer credit assignment.
+
+**Cannot rule out (caveats)**:
+- BP uses end-to-end loss, DFA uses local block losses — the 2×2 isn't a perfectly clean isolation of "credit quality" in a vacuum
+- The "different optimization regime" objection: penalty hurts BP (-8 pp) while helping DFA (+5.5 pp), opposite of what a generally-beneficial regime shift would do, so this is unlikely but not airtight
## §5 Pipeline pitfalls reproducers