summaryrefslogtreecommitdiff
path: root/records/track_10min_16mb/2026-03-19_WarmdownQuantization
AgeCommit message (Collapse)Author
30 hoursInt6 + MLP 3x + sliding window: val_bpb=1.1574 (#61)Sam Larson
* Warmdown-quantization co-optimization, val_bpb=1.2154 Novel finding: aggressive LR decay (WARMDOWN_ITERS=20000) reduces int8 quantization penalty from 0.014 to 0.005 BPB. Combined with FP16 tied embeddings and moderate NTK-RoPE extrapolation (eval@1408). Full warmdown sweep across 10 values and detailed analysis in README. * breakthrough: 1.1574 BPB via int6 + MLP 3x + sliding window stride=256 --------- Co-authored-by: Sam Larson <saml212@users.noreply.github.com>