report_explore/MEMO_6A_snapshot_exploitability.md


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39

# Phase 6A Memo: Snapshot Exploitability

**Date**: 2026-03-24
**Config**: BP snapshot, CIFAR-10, L=4, d=256 (61.9% acc), seed=42

## Question
On a fixed snapshot, does better credit lead to better real loss decrease via the current local surrogate?

## Results

| Method | Gamma | rho | dL_1step | dL_5step | dL_20step |
|--------|-------|-----|----------|----------|-----------|
| DFA | 0.009 | -0.023 | **-0.0004** | **+0.0002** | **-0.0007** |
| ScalarCB | 0.122 | 0.090 | +0.003 | +0.042 | +0.405 |
| Vec_M4 | 0.378 | 0.411 | +0.003 | +0.050 | +0.272 |
| Oracle BP | 1.000 | 0.998 | **-0.001** | +0.007 | +0.026 |

## Key Finding: The Local Surrogate is Anti-Correlated with Credit Quality

**Better credit produces WORSE loss change.** DFA (Gamma≈0) is the only method that decreases loss. ScalarCB (Gamma=0.12) and Vec (Gamma=0.38) both increase loss, with Vec slightly worse. Even Oracle BP increases loss at 5+ steps.

The inner-product surrogate `L_local = <F_l(h_l), a_l>` is fundamentally broken as a local update rule for directional credit:
- It treats a_l as a "desired direction for the residual output" rather than a gradient
- The gradient of this surrogate w.r.t. block parameters pushes F_l(h) to align with a_l, but this is NOT the same as making h_{l+1} = h_l + F_l(h_l) move in the direction that decreases global loss
- DFA "works" precisely because its random credits are small and roughly isotropic — the updates are near-random perturbations that don't systematically damage the representation

## Verdict

**This is Case B: the local update rule is the bottleneck, not the estimator or tracking.**

Improving credit quality from DFA (Gamma=0.01) through ScalarCB (0.12) to Vec (0.38) to Oracle BP (1.0) does NOT improve — and actually worsens — real parameter update quality.

## Implication

The project should pivot from "better credit estimator" to "better local update coupling." The target-shift local regression rule (Phase 6C) is the natural next experiment:

`L_shift = 0.5 * || h_l + F_l(h_l) - sg(h_{l+1} - eta * a_{l+1}^norm) ||^2`

This directly tells each block: "adjust your output so the next hidden state moves toward the credit-indicated direction."