report_explore/MEMO_7A_snapshot_time_sweep.md


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37

# Phase 7A Memo: Snapshot Time Sweep

**Date**: 2026-03-25

## Question
Is "same-batch descent + held-out ascent" a late-snapshot artifact, or does it persist across training?

## Answer: Primarily a late-snapshot artifact. Early snapshots show positive held-out transfer.

### 5-step DeltaLoss results (raw credit, last-block-only):

| Epoch | Acc | DFA dL_held | Vec dL_held | Oracle dL_held | Vec PUR_5 |
|-------|-----|-------------|-------------|----------------|-----------|
| **5** | 0.49 | +0.003 | **-0.005** | **-0.009** | **0.70** |
| 20 | 0.57 | +0.001 | +0.002 | +0.000 | -3.87 |
| 100 | 0.62 | +0.000 | +0.001 | -0.001 | -1.01 |

### Key findings:

1. **At epoch 5, Vec and Oracle both decrease held-out loss**, while DFA increases it. Vec PUR=0.70 means 70% of same-batch improvement transfers to held-out. Oracle PUR=1.05 (>100% transfer).

2. **By epoch 20, the generalization window closes.** All methods show near-zero or positive held-out change.

3. **Better credit → lower update variance.** Vec/Oracle update variance is 50x lower than DFA (0.4-0.8 vs 40-60). Better credit produces MORE consistent cross-batch updates, not less.

4. **DFA never improves held-out at any snapshot.** Its updates are random enough to sometimes decrease same-batch loss but never systematically improve held-out.

## Implications

The "better credit is useless" narrative from Phase 6A/6.5A was wrong on two counts:
1. Same-batch exploitability works (Phase 6.5A)
2. Early-snapshot held-out transfer works too (this experiment)

The online training failure is because by the time the warmup phase ends and credit bridge takes over (epoch ~20), the network is already past the "generalization window" where local credit updates are useful. The fix should be: **use credit bridge from the start (no DFA warmup), or switch earlier.**

## Next step recommendation
Phase 7B (multi-batch averaging) may not be needed given that the held-out failure is a snapshot-timing issue, not a batch-variance issue. Instead, the priority should be testing online training WITH vector credit from epoch 0 (no warmup or very short warmup).