summaryrefslogtreecommitdiff
path: root/report_explore/MEMO_6.5A_samebatch_linesearch.md
blob: 733db1262b1d85726440f0f66445da3c571bdeee (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
# Phase 6.5A Memo: Same-Batch Linesearch

**Date**: 2026-03-25

## Question
Under strict same-batch evaluation, does better credit produce better loss decrease?

## Answer: YES.

Phase 6A's conclusion was wrong due to protocol confounds. With same-batch evaluation:

### All blocks, normalized credit (closest to Phase 6A protocol):

| Method | best eta | dL_same | dL_held |
|--------|---------|---------|---------|
| DFA | 1e-2 | -0.003 | +0.004 |
| ScalarCB | 3e-3 | -0.025 | +0.027 |
| Vec_M4 | 3e-3 | **-0.135** | +0.045 |
| Oracle BP | 1e-2 | **-0.406** | +0.094 |

**Same-batch loss decreases monotonically with credit quality**: Oracle > Vec > ScalarCB > DFA.

### But held-out loss increases for all non-DFA methods.

This is **Case D**: the local surrogate correctly exploits credit to decrease training loss, but the update overfits to the batch. The better the credit, the more effective the overfitting.

### Key confounds in Phase 6A:
1. **Normalization**: Phase 6A always normalized credit, which amplified DFA's weak signals to the same magnitude as Vec's strong signals, erasing the natural ordering
2. **Held-out evaluation**: Phase 6A evaluated on held-out batches, showing the generalization failure rather than the exploitability success
3. **Gradient clamping**: Phase 6A clamped gradients to [-1, 1], further distorting the relationship

### Raw vs Normalized (all blocks):
| Method | raw dL_same (best) | norm dL_same (best) |
|--------|--------------------|---------------------|
| Vec_M4 | -0.005 | -0.135 |
| Oracle | -0.003 | -0.406 |

Raw credit produces tiny updates because BP gradients have RMS ≈ 0.00004. Normalization brings all methods to comparable magnitude but introduces overfitting.

## Revised Diagnosis

The bottleneck is NOT "local surrogate cannot exploit good credit" (Case B from Phase 6A). It IS:
- **Generalization/overfitting**: local surrogate with good credit decreases train loss but increases held-out loss
- This means the project direction should be about **regularizing local updates** rather than replacing the surrogate entirely