diff options
| author | Sam Larson <166414725+saml212@users.noreply.github.com> | 2026-03-19 14:28:57 -0700 |
|---|---|---|
| committer | GitHub <noreply@github.com> | 2026-03-19 14:28:57 -0700 |
| commit | 555669e8330472143139c2f82bba15baab1a5e0d (patch) | |
| tree | 71984b3355769e6e21e374a4f39327463ca1de02 /records/track_10min_16mb/2026-03-19_WarmdownQuantization/submission.json | |
| parent | ad7b62c714740d65bb714e5ca530eebe9c8cedac (diff) | |
Int6 + MLP 3x + sliding window: val_bpb=1.1574 (#61)
* Warmdown-quantization co-optimization, val_bpb=1.2154
Novel finding: aggressive LR decay (WARMDOWN_ITERS=20000) reduces int8 quantization
penalty from 0.014 to 0.005 BPB. Combined with FP16 tied embeddings and moderate
NTK-RoPE extrapolation (eval@1408).
Full warmdown sweep across 10 values and detailed analysis in README.
* breakthrough: 1.1574 BPB via int6 + MLP 3x + sliding window stride=256
---------
Co-authored-by: Sam Larson <saml212@users.noreply.github.com>
Diffstat (limited to 'records/track_10min_16mb/2026-03-19_WarmdownQuantization/submission.json')
| -rw-r--r-- | records/track_10min_16mb/2026-03-19_WarmdownQuantization/submission.json | 11 |
1 files changed, 11 insertions, 0 deletions
diff --git a/records/track_10min_16mb/2026-03-19_WarmdownQuantization/submission.json b/records/track_10min_16mb/2026-03-19_WarmdownQuantization/submission.json new file mode 100644 index 0000000..07c58c3 --- /dev/null +++ b/records/track_10min_16mb/2026-03-19_WarmdownQuantization/submission.json @@ -0,0 +1,11 @@ +{ + "author": "samuellarson", + "github_id": "samuellarson", + "name": "Int6 MLP3x Sliding Window", + "blurb": "Int6 post-training quantization enables 3x MLP expansion (21.8M params in 16MB). Combined with train@2048 + sliding window eval + FP16 tied embeddings + Late-K passthrough.", + "date": "2026-03-20", + "val_loss": 1.95428963, + "val_bpb": 1.15744040, + "bytes_total": 15977717, + "bytes_code": 51200 +} |
