parameter-golf.git - Unnamed repository; edit this file 'description' to name the repository.

diff options

author	Sam Larson <166414725+saml212@users.noreply.github.com>	2026-03-19 14:28:57 -0700
committer	GitHub <noreply@github.com>	2026-03-19 14:28:57 -0700
commit	555669e8330472143139c2f82bba15baab1a5e0d (patch)
tree	71984b3355769e6e21e374a4f39327463ca1de02 /data/tokenizer_specs.json
parent	ad7b62c714740d65bb714e5ca530eebe9c8cedac (diff)

Int6 + MLP 3x + sliding window: val_bpb=1.1574 (#61)

* Warmdown-quantization co-optimization, val_bpb=1.2154 Novel finding: aggressive LR decay (WARMDOWN_ITERS=20000) reduces int8 quantization penalty from 0.014 to 0.005 BPB. Combined with FP16 tied embeddings and moderate NTK-RoPE extrapolation (eval@1408). Full warmdown sweep across 10 values and detailed analysis in README. * breakthrough: 1.1574 BPB via int6 + MLP 3x + sliding window stride=256 --------- Co-authored-by: Sam Larson <saml212@users.noreply.github.com>

Diffstat (limited to 'data/tokenizer_specs.json')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: