summaryrefslogtreecommitdiff
path: root/train_gpt_mlx.py
diff options
context:
space:
mode:
authorRenier Velazco <renier.velazco94@gmail.com>2026-03-19 13:16:50 -0400
committerGitHub <noreply@github.com>2026-03-19 10:16:50 -0700
commita5eb9edbfb7391a5e323cd062222ad9bfe846974 (patch)
tree098fb8d21c55cdf4879934054be22db2b650c72c /train_gpt_mlx.py
parent2081ba1cb7c779b3aedbf728bf4448f772083ce2 (diff)
fp16 tied embedding + lr/warmdown tuning — val_bpb 1.2197 (#42)
keep tok_emb.weight in fp16 during int8 export (kills the quant gap), shrink MLP hidden to 992 to fit under 16MB, bump warmdown to 3600 and matrix LR to 0.06. tested on 8xH100 SXM (2 seeds) and 8xH200 SXM (3 seeds). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Diffstat (limited to 'train_gpt_mlx.py')
0 files changed, 0 insertions, 0 deletions