diff options
| author | YurenHao0426 <Blackhao0426@gmail.com> | 2026-04-08 02:04:28 -0500 |
|---|---|---|
| committer | YurenHao0426 <Blackhao0426@gmail.com> | 2026-04-08 02:04:28 -0500 |
| commit | 8bf53ab94ac31c7672d23e2edf0e40c787b157d4 (patch) | |
| tree | 42621f1ae96a98c3a3294436c6e9fd21a4b6e274 /protocol | |
| parent | 78bd7ad68c174362e944c2b598beb859c2952c0b (diff) | |
EVIDENCE_SUMMARY: §4 fully rewritten under locked two-distinct-modes framing
§4 now reflects all 5 independent validations of the converged framing:
1. Direct deep cos on penalized DFA (3 seeds): +0.155 ± 0.025
2. Null calibration with fresh Bs: +0.002 ± 0.022 (real signal)
3. Hypothesis B disambiguation (vanilla early ep): -0.008 ± 0.013
4. BP+penalty 2×2 control: 17 pp residual = credit quality
5. Multi-seed lock-in: 24 measurements all near zero
Round 20 language tightening applied:
- 'lower bound on non-capacity gap' instead of 'clean isolation'
- Explicit caveats about end-to-end vs local-loss difference
- Counter to 'different optimization regime' objection
The §4 framing is locked. Five independent validations done. Stop
iterating, start writing.
Diffstat (limited to 'protocol')
| -rw-r--r-- | protocol/EVIDENCE_SUMMARY.md | 63 |
1 files changed, 50 insertions, 13 deletions
diff --git a/protocol/EVIDENCE_SUMMARY.md b/protocol/EVIDENCE_SUMMARY.md index 0da2e75..d1d6d9b 100644 --- a/protocol/EVIDENCE_SUMMARY.md +++ b/protocol/EVIDENCE_SUMMARY.md @@ -84,20 +84,57 @@ itself is well-defined. Reproduce: `python -m protocol.examples.audit_cnn` -## §4 Two failure modes +## §4 Two distinct failure modes (LOCKED — round 20) -| evidence | result | reproduce | -|---|---|---| -| Penalty rescue (3 seeds, λ=1e-2) | DFA acc 0.308 → 0.363, ‖h_L‖ 4e8 → 4e4, ‖g_L‖ 5e-10 → 1e-6 | `dfa_residual_penalty_test.py --lam 1e-2` | -| **Penalty partial protocol audit** | Penalized DFA: (a)+(b) **PASS** (penalty fixes scale), but (d) **STILL FIRES** on 3/3 seeds (margin 1.38 ± 0.05 pp < 2 pp) | `python -m protocol.examples.penalty_partial_audit` | -| Vanilla DFA per-layer cosine (3 seeds) | layer 0: cos = +0.42 (high), layers 1-4: cos ≈ 0 (range -0.03 to +0.03). Headline +0.07 is entirely from layer 0. | `python experiments/measure_direction_quality_existing_ckpt.py --seed 42` | - -The two putative failure modes are **partially dissociated by intervention** -(round 18 softening): the penalty alleviates the scale-related diagnostics -(a)+(b) while the frozen-baseline diagnostic (d) still fires. (d) provides -independent evidence that poor use of depth persists after the scale -pathology is reduced. Full mechanistic separability requires direct -deep-block credit measurement on the penalized checkpoint (in progress). +The §4 framing is locked after rounds 18-20. Two distinct failure modes, +five independent validations. + +### Mode 1: measurement degeneracy via terminal-LayerNorm gradient cancellation + +Residual stream growth → BP gradient at hidden layers collapses below the +1e-7 floor → cosine alignment metric measured against degenerate reference. +**Caught by diagnostic (b).** Direct empirical evidence is the 5-method +audit table where DFA/SB/CB all have ‖g_L‖ ~ 1e-9. + +### Mode 2: low intrinsic credit-direction quality of random feedback + +Even in the meaningful regime (vanilla DFA at ep 1, ‖g‖ ~ 10⁻⁶), DFA's +local credit signal `e_T B_l^T` is essentially uncorrelated with BP grad +on deep layers. **Caught by direct per-layer cosine measurement.** + +### Five independent validations of the converged framing + +| # | evidence | result | reproduce | +|---|---|---|---| +| 1 | Direct deep-layer cosine on penalized DFA, 3 seeds | layer-mean +0.186 ± 0.007; deep mean +0.155 ± 0.025 | `experiments/measure_direction_quality_existing_ckpt.py` on `results/dfa_pen_short/dfa_pen_lam0.01_s{42,123,456}.pt` | +| 2 | Null calibration with 20 fresh random Bs | training-Bs deep cos +0.16 vs fresh-Bs +0.002 ± 0.022 | `experiments/null_calibration_penalized_cos.py` | +| 3 | Hypothesis B disambiguation (vanilla early-epoch) | vanilla deep cos -0.008 ± 0.013 across 3 seeds × ep 1, even with ‖g‖ in meaningful regime | `experiments/vanilla_dfa_early_ckpt.py` + measure script | +| 4 | BP+penalty capacity-cost 2×2 control | BP+pen acc 0.530 (-8 pp); DFA+pen 0.363 (+5.5 pp); 17 pp residual gap consistent with credit quality | `experiments/bp_with_penalty_control.py` | +| 5 | Multi-seed lock-in (round 20) | 24 measurements (3 seeds × 2 epochs × 4 deep layers) all in [-0.04, +0.02] | iterate measure script over s42/s123/s456 × ep1/ep2 | + +### Penalty rescue 3-seed table (lam=1e-2) + +| seed | acc | ‖h_L‖ | ‖g_2‖ | deep cos l1-l4 mean | +|---:|---:|---:|---:|---:| +| 42 | 0.363 | 3.8e4 | 9.9e-7 | +0.163 | +| 123 | 0.362 | 4.1e4 | 8.1e-7 | +0.151 | +| 456 | 0.364 | 4.1e4 | 9.0e-7 | +0.139 | +| **mean** | **0.363 ± 0.001** | **4.0e4** | **9.0e-7** | **+0.151 ± 0.012** | + +### BP+penalty 2×2 grid (raw acc, primary number per round 20) + +| | no penalty | with penalty | penalty effect | +|---|---:|---:|---:| +| BP | 0.609 | **0.530** | −8 pp (capacity loss) | +| DFA | 0.308 | 0.363 | +5.5 pp (rescue) | + +### Round 20 phrasing for the gap + +**Lower bound on non-capacity gap**: matched penalty controls show that only part of DFA's deficit is attributable to the representational/optimization cost of the penalty itself; a substantial residual remains and is consistent with poorer credit assignment. + +**Cannot rule out (caveats)**: +- BP uses end-to-end loss, DFA uses local block losses — the 2×2 isn't a perfectly clean isolation of "credit quality" in a vacuum +- The "different optimization regime" objection: penalty hurts BP (-8 pp) while helping DFA (+5.5 pp), opposite of what a generally-beneficial regime shift would do, so this is unlikely but not airtight ## §5 Pipeline pitfalls reproducers |
