diff options
| author | YurenHao0426 <Blackhao0426@gmail.com> | 2026-06-14 20:32:31 -0500 |
|---|---|---|
| committer | YurenHao0426 <Blackhao0426@gmail.com> | 2026-06-14 20:32:31 -0500 |
| commit | 1118b7457c261de36ead6103503c00c321c75f9b (patch) | |
| tree | 7ea76b32f070cb58458caaa2897a5d8133561f48 /experiments/__pycache__ | |
| parent | aa73718eb6427d7da3b9cb416275802d90c4b2ed (diff) | |
Appendix experiment triangulating the depth-utility diagnostic (D3) by varying
the number of trainable residual blocks k (last-k trainable, first L-k frozen at
init; embed/LN/head always trained).
- d=256 L=4 and d=512 L=2, 3 seeds, recipe identical to the main audit.
- BP climbs monotonically (+22-23pp); DFA peaks at the frozen baseline (k=0) and
declines once any deep block is trained; FA shows partial/no net depth utility.
- Cross-checks reproduce existing anchors (BP 0.617, DFA 0.301, FA 0.402, frozen 0.349).
- frozen_init_identity_check quantifies frozen stack as a near-norm-preserving
random feature map (per-block ||f||/||h||~0.10, stack cos 0.981), explaining the
above-chance k=0 rung.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Diffstat (limited to 'experiments/__pycache__')
0 files changed, 0 insertions, 0 deletions
