summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2026-03-19Update README.md (#105)Will DePue
2026-03-19fp16 tied embedding + lr/warmdown tuning — val_bpb 1.2197 (#42)Renier Velazco
keep tok_emb.weight in fp16 during int8 export (kills the quant gap), shrink MLP hidden to 992 to fit under 16MB, bump warmdown to 3600 and matrix LR to 0.06. tested on 8xH100 SXM (2 seeds) and 8xH200 SXM (3 seeds). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-03-19Merge pull request #100 from sandsevenone/mlx_eager_evalWill DePue
Use eager mx.eval() to fix running train script on 16GB Mac devices
2026-03-20Add MLX_EAGER_EVAL flag to further reduce memory pressure by ↵sandrone
force-evaluating the graph after each sub-batch step
2026-03-18Update README.mdWill DePue
2026-03-18Update train_gpt_mlx.pyWill DePue
2026-03-18Update train_gpt.pyWill DePue
2026-03-18Merge pull request #35 from openai/0hq-patch-1Will DePue
Update README.md
2026-03-18Update README.mdWill DePue
2026-03-18Merge pull request #32 from yhn112/fix-mlx-eval-memory-growthWill DePue
Fix MLX multi-batch validation memory growth
2026-03-18Merge pull request #9 from oof-baroomf/patch-1Will DePue
Update README typo
2026-03-18Merge pull request #18 from berniwal/mainWill DePue
MLX Timing Mismatch with Main Script
2026-03-19Log MLX validation progressMichael Diskin
2026-03-19Fix MLX validation loss accumulationMichael Diskin
2026-03-18match timing to main script to exclude eval timingbernhardwalser
2026-03-18Update README typoDhruv Saini
2026-03-18Remove scriptsWill DePue
2026-03-18Update README.mdWill DePue
2026-03-18Launch snapshotWill DePue