parameter-golf.git - Unnamed repository; edit this file 'description' to name the repository.

diff options

author	Renier Velazco <renier.velazco94@gmail.com>	2026-03-19 13:16:50 -0400
committer	GitHub <noreply@github.com>	2026-03-19 10:16:50 -0700
commit	a5eb9edbfb7391a5e323cd062222ad9bfe846974 (patch)
tree	098fb8d21c55cdf4879934054be22db2b650c72c /train_gpt_mlx.py
parent	2081ba1cb7c779b3aedbf728bf4448f772083ce2 (diff)

fp16 tied embedding + lr/warmdown tuning — val_bpb 1.2197 (#42)

keep tok_emb.weight in fp16 during int8 export (kills the quant gap), shrink MLP hidden to 992 to fit under 16MB, bump warmdown to 3600 and matrix LR to 0.06. tested on 8xH100 SXM (2 seeds) and 8xH200 SXM (3 seeds). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Diffstat (limited to 'train_gpt_mlx.py')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: