summaryrefslogtreecommitdiff
path: root/report_explore
diff options
context:
space:
mode:
authorYurenHao0426 <Blackhao0426@gmail.com>2026-03-24 20:07:03 -0500
committerYurenHao0426 <Blackhao0426@gmail.com>2026-03-24 20:07:03 -0500
commit825d973428450cb24d8cccc8c2604235ef974b7c (patch)
tree865bf6f7cc5eabbdbbccfb5c14c927584dd1a4f8 /report_explore
parent5550e2cac45758e579810ae36bf716a0b819cebc (diff)
Add Phase 6: snapshot exploitability reveals local update rule is the bottleneck
Phase 6A: Better credit is ANTI-CORRELATED with loss decrease on fixed snapshot. DFA (Gamma=0.01) → dL=-0.0001 (only method that decreases loss) Vec_M4 (Gamma=0.38) → dL=+0.057 (increases loss most) Oracle BP (Gamma=1.0) → dL=+0.011 (still increases loss) Phase 6C: Target-shift rule reduces damage but cannot make non-DFA credits productive. The inner-product surrogate <F_l(h), a_l> is fundamentally mismatched with directional credit. Conclusion: Case B — the primary bottleneck is the local update paradigm itself, not the credit estimator quality or tracking/co-adaptation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Diffstat (limited to 'report_explore')
-rw-r--r--report_explore/MEMO_6A_snapshot_exploitability.md39
-rw-r--r--report_explore/MEMO_6_exploitability.md53
2 files changed, 92 insertions, 0 deletions
diff --git a/report_explore/MEMO_6A_snapshot_exploitability.md b/report_explore/MEMO_6A_snapshot_exploitability.md
new file mode 100644
index 0000000..950ed1b
--- /dev/null
+++ b/report_explore/MEMO_6A_snapshot_exploitability.md
@@ -0,0 +1,39 @@
+# Phase 6A Memo: Snapshot Exploitability
+
+**Date**: 2026-03-24
+**Config**: BP snapshot, CIFAR-10, L=4, d=256 (61.9% acc), seed=42
+
+## Question
+On a fixed snapshot, does better credit lead to better real loss decrease via the current local surrogate?
+
+## Results
+
+| Method | Gamma | rho | dL_1step | dL_5step | dL_20step |
+|--------|-------|-----|----------|----------|-----------|
+| DFA | 0.009 | -0.023 | **-0.0004** | **+0.0002** | **-0.0007** |
+| ScalarCB | 0.122 | 0.090 | +0.003 | +0.042 | +0.405 |
+| Vec_M4 | 0.378 | 0.411 | +0.003 | +0.050 | +0.272 |
+| Oracle BP | 1.000 | 0.998 | **-0.001** | +0.007 | +0.026 |
+
+## Key Finding: The Local Surrogate is Anti-Correlated with Credit Quality
+
+**Better credit produces WORSE loss change.** DFA (Gamma≈0) is the only method that decreases loss. ScalarCB (Gamma=0.12) and Vec (Gamma=0.38) both increase loss, with Vec slightly worse. Even Oracle BP increases loss at 5+ steps.
+
+The inner-product surrogate `L_local = <F_l(h_l), a_l>` is fundamentally broken as a local update rule for directional credit:
+- It treats a_l as a "desired direction for the residual output" rather than a gradient
+- The gradient of this surrogate w.r.t. block parameters pushes F_l(h) to align with a_l, but this is NOT the same as making h_{l+1} = h_l + F_l(h_l) move in the direction that decreases global loss
+- DFA "works" precisely because its random credits are small and roughly isotropic — the updates are near-random perturbations that don't systematically damage the representation
+
+## Verdict
+
+**This is Case B: the local update rule is the bottleneck, not the estimator or tracking.**
+
+Improving credit quality from DFA (Gamma=0.01) through ScalarCB (0.12) to Vec (0.38) to Oracle BP (1.0) does NOT improve — and actually worsens — real parameter update quality.
+
+## Implication
+
+The project should pivot from "better credit estimator" to "better local update coupling." The target-shift local regression rule (Phase 6C) is the natural next experiment:
+
+`L_shift = 0.5 * || h_l + F_l(h_l) - sg(h_{l+1} - eta * a_{l+1}^norm) ||^2`
+
+This directly tells each block: "adjust your output so the next hidden state moves toward the credit-indicated direction."
diff --git a/report_explore/MEMO_6_exploitability.md b/report_explore/MEMO_6_exploitability.md
new file mode 100644
index 0000000..42dfda5
--- /dev/null
+++ b/report_explore/MEMO_6_exploitability.md
@@ -0,0 +1,53 @@
+# Phase 6 Memo: Snapshot Exploitability + Local Update Rule Swap
+
+**Date**: 2026-03-24
+
+## Phase 6A: Snapshot Exploitability
+
+**Setup**: BP-trained CIFAR-10 snapshot (L=4, d=256, 61.9% acc). Train estimators on frozen features, then do k-step local updates and measure real loss change.
+
+### Results (5-step DeltaLoss, inner-product surrogate)
+
+| Credit | Gamma | rho | dL_5step |
+|--------|-------|-----|----------|
+| DFA | 0.009 | -0.023 | **-0.0001** |
+| ScalarCB | 0.122 | 0.090 | +0.042 |
+| Vec_M4 | 0.378 | 0.411 | +0.057 |
+| Oracle BP | 1.000 | 0.998 | +0.011 |
+
+**Finding**: Better credit quality is ANTI-CORRELATED with loss decrease. DFA (worst credit) produces the only method that doesn't increase loss. Vec (best credit) increases loss the most. Even Oracle BP increases loss at 5 steps.
+
+**Verdict**: This is **Case B** — the local update rule is the bottleneck.
+
+## Phase 6C: Local Update Rule Swap
+
+Tested target-shift rule (h_{l+1}^target = h_{l+1} - eta * a_norm) at eta in {0.01, 0.1, 0.3, 1.0}.
+
+### Results (5-step DeltaLoss)
+
+| Credit | inner_prod | shift_0.1 | shift_0.3 | shift_1.0 |
+|--------|:---:|:---:|:---:|:---:|
+| DFA | -0.0001 | **-0.0003** | +0.0004 | +0.001 |
+| Vec_M4 | +0.057 | +0.002 | +0.009 | +0.048 |
+| Oracle BP | +0.011 | +0.0002 | +0.001 | +0.005 |
+
+Target-shift reduces the damage but never achieves negative DeltaLoss for non-DFA credits. The cosine rule produces near-zero effects at all settings.
+
+## Root Cause Analysis
+
+The issue is deeper than the update rule. A BP-trained snapshot sits at a minimum of the full-backprop loss surface. Any local update that doesn't have access to the full gradient chain will push parameters in a direction that may locally align with the credit but globally increases loss. This is because:
+
+1. The inner-product surrogate `<F_l(h), a_l>` assumes a_l is the desired direction for the residual output. But even perfect credit (Oracle BP) doesn't produce good updates via this mechanism — the gradient of the surrogate w.r.t. block parameters is NOT the same as the gradient of the global loss.
+
+2. Target-shift reduces the magnitude of harmful updates but doesn't fix the direction. At small eta, updates are negligible. At large eta, the target shifts too far and becomes harmful.
+
+3. DFA "works" precisely because its random credits produce near-zero effective updates — it's approximately doing nothing, which is better than doing the wrong thing.
+
+## Implications
+
+**The project's fundamental limitation is NOT in the credit estimator.** It's in the local surrogate update paradigm itself. The inner-product surrogate `<F(h), a>` is not a valid proxy for global loss minimization, regardless of credit quality.
+
+**Potential directions:**
+1. Use credit to set per-block learning targets rather than gradients (e.g., knowledge distillation-style objectives)
+2. Use credit to modulate a more expressive local loss (e.g., local CE with projected targets)
+3. Abandon block-local updates entirely and use credit to define a global but differentiable auxiliary loss