From 825d973428450cb24d8cccc8c2604235ef974b7c Mon Sep 17 00:00:00 2001 From: YurenHao0426 Date: Tue, 24 Mar 2026 20:07:03 -0500 Subject: Add Phase 6: snapshot exploitability reveals local update rule is the bottleneck MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Phase 6A: Better credit is ANTI-CORRELATED with loss decrease on fixed snapshot. DFA (Gamma=0.01) → dL=-0.0001 (only method that decreases loss) Vec_M4 (Gamma=0.38) → dL=+0.057 (increases loss most) Oracle BP (Gamma=1.0) → dL=+0.011 (still increases loss) Phase 6C: Target-shift rule reduces damage but cannot make non-DFA credits productive. The inner-product surrogate is fundamentally mismatched with directional credit. Conclusion: Case B — the primary bottleneck is the local update paradigm itself, not the credit estimator quality or tracking/co-adaptation. Co-Authored-By: Claude Opus 4.6 (1M context) --- report_explore/MEMO_6_exploitability.md | 53 +++++++++++++++++++++++++++++++++ 1 file changed, 53 insertions(+) create mode 100644 report_explore/MEMO_6_exploitability.md (limited to 'report_explore/MEMO_6_exploitability.md') diff --git a/report_explore/MEMO_6_exploitability.md b/report_explore/MEMO_6_exploitability.md new file mode 100644 index 0000000..42dfda5 --- /dev/null +++ b/report_explore/MEMO_6_exploitability.md @@ -0,0 +1,53 @@ +# Phase 6 Memo: Snapshot Exploitability + Local Update Rule Swap + +**Date**: 2026-03-24 + +## Phase 6A: Snapshot Exploitability + +**Setup**: BP-trained CIFAR-10 snapshot (L=4, d=256, 61.9% acc). Train estimators on frozen features, then do k-step local updates and measure real loss change. + +### Results (5-step DeltaLoss, inner-product surrogate) + +| Credit | Gamma | rho | dL_5step | +|--------|-------|-----|----------| +| DFA | 0.009 | -0.023 | **-0.0001** | +| ScalarCB | 0.122 | 0.090 | +0.042 | +| Vec_M4 | 0.378 | 0.411 | +0.057 | +| Oracle BP | 1.000 | 0.998 | +0.011 | + +**Finding**: Better credit quality is ANTI-CORRELATED with loss decrease. DFA (worst credit) produces the only method that doesn't increase loss. Vec (best credit) increases loss the most. Even Oracle BP increases loss at 5 steps. + +**Verdict**: This is **Case B** — the local update rule is the bottleneck. + +## Phase 6C: Local Update Rule Swap + +Tested target-shift rule (h_{l+1}^target = h_{l+1} - eta * a_norm) at eta in {0.01, 0.1, 0.3, 1.0}. + +### Results (5-step DeltaLoss) + +| Credit | inner_prod | shift_0.1 | shift_0.3 | shift_1.0 | +|--------|:---:|:---:|:---:|:---:| +| DFA | -0.0001 | **-0.0003** | +0.0004 | +0.001 | +| Vec_M4 | +0.057 | +0.002 | +0.009 | +0.048 | +| Oracle BP | +0.011 | +0.0002 | +0.001 | +0.005 | + +Target-shift reduces the damage but never achieves negative DeltaLoss for non-DFA credits. The cosine rule produces near-zero effects at all settings. + +## Root Cause Analysis + +The issue is deeper than the update rule. A BP-trained snapshot sits at a minimum of the full-backprop loss surface. Any local update that doesn't have access to the full gradient chain will push parameters in a direction that may locally align with the credit but globally increases loss. This is because: + +1. The inner-product surrogate `` assumes a_l is the desired direction for the residual output. But even perfect credit (Oracle BP) doesn't produce good updates via this mechanism — the gradient of the surrogate w.r.t. block parameters is NOT the same as the gradient of the global loss. + +2. Target-shift reduces the magnitude of harmful updates but doesn't fix the direction. At small eta, updates are negligible. At large eta, the target shifts too far and becomes harmful. + +3. DFA "works" precisely because its random credits produce near-zero effective updates — it's approximately doing nothing, which is better than doing the wrong thing. + +## Implications + +**The project's fundamental limitation is NOT in the credit estimator.** It's in the local surrogate update paradigm itself. The inner-product surrogate `` is not a valid proxy for global loss minimization, regardless of credit quality. + +**Potential directions:** +1. Use credit to set per-block learning targets rather than gradients (e.g., knowledge distillation-style objectives) +2. Use credit to modulate a more expressive local loss (e.g., local CE with projected targets) +3. Abandon block-local updates entirely and use credit to define a global but differentiable auxiliary loss -- cgit v1.2.3