| Age | Commit message (Collapse) | Author | |
|---|---|---|---|
| 30 hours | Int6 + MLP 3x + sliding window: val_bpb=1.1574 (#61) | Sam Larson | |
| * Warmdown-quantization co-optimization, val_bpb=1.2154 Novel finding: aggressive LR decay (WARMDOWN_ITERS=20000) reduces int8 quantization penalty from 0.014 to 0.005 BPB. Combined with FP16 tied embeddings and moderate NTK-RoPE extrapolation (eval@1408). Full warmdown sweep across 10 values and detailed analysis in README. * breakthrough: 1.1574 BPB via int6 + MLP 3x + sliding window stride=256 --------- Co-authored-by: Sam Larson <saml212@users.noreply.github.com> | |||
