blob: 733db1262b1d85726440f0f66445da3c571bdeee (
plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
|
# Phase 6.5A Memo: Same-Batch Linesearch
**Date**: 2026-03-25
## Question
Under strict same-batch evaluation, does better credit produce better loss decrease?
## Answer: YES.
Phase 6A's conclusion was wrong due to protocol confounds. With same-batch evaluation:
### All blocks, normalized credit (closest to Phase 6A protocol):
| Method | best eta | dL_same | dL_held |
|--------|---------|---------|---------|
| DFA | 1e-2 | -0.003 | +0.004 |
| ScalarCB | 3e-3 | -0.025 | +0.027 |
| Vec_M4 | 3e-3 | **-0.135** | +0.045 |
| Oracle BP | 1e-2 | **-0.406** | +0.094 |
**Same-batch loss decreases monotonically with credit quality**: Oracle > Vec > ScalarCB > DFA.
### But held-out loss increases for all non-DFA methods.
This is **Case D**: the local surrogate correctly exploits credit to decrease training loss, but the update overfits to the batch. The better the credit, the more effective the overfitting.
### Key confounds in Phase 6A:
1. **Normalization**: Phase 6A always normalized credit, which amplified DFA's weak signals to the same magnitude as Vec's strong signals, erasing the natural ordering
2. **Held-out evaluation**: Phase 6A evaluated on held-out batches, showing the generalization failure rather than the exploitability success
3. **Gradient clamping**: Phase 6A clamped gradients to [-1, 1], further distorting the relationship
### Raw vs Normalized (all blocks):
| Method | raw dL_same (best) | norm dL_same (best) |
|--------|--------------------|---------------------|
| Vec_M4 | -0.005 | -0.135 |
| Oracle | -0.003 | -0.406 |
Raw credit produces tiny updates because BP gradients have RMS ≈ 0.00004. Normalization brings all methods to comparable magnitude but introduces overfitting.
## Revised Diagnosis
The bottleneck is NOT "local surrogate cannot exploit good credit" (Case B from Phase 6A). It IS:
- **Generalization/overfitting**: local surrogate with good credit decreases train loss but increases held-out loss
- This means the project direction should be about **regularizing local updates** rather than replacing the surrogate entirely
|