| Age | Commit message (Collapse) | Author | |
|---|---|---|---|
| 33 hours | Update README.md | Will DePue | |
| 33 hours | New SOTA attempt (#52) | spokane-way | |
| Co-authored-by: spokane-way <spokane@way> | |||
| 33 hours | Update README.md | Will DePue | |
| 33 hours | Fix: score final partial window in sliding window eval (#124) | Matthew Li | |
| The window_starts filter dropped windows shorter than stride, silently skipping up to (stride-1) tokens at the end of the validation set. Now includes all windows with >= 1 scoreable token, and clamps the score start for short final windows. | |||
| 36 hours | Add record: Sliding Window Eval (stride=64), val_bpb=1.1925 (#50) | Matthew Li | |
| 36 hours | Update README.md | Will DePue | |
| 36 hours | SOTA attempt (val_bpb=1.2064) (#49) | spokane-way | |
| * SOTA attempt * Improve score on SXM --------- Co-authored-by: spokane-way <spokane@way> | |||
| 36 hours | clarify torch version | Alex | |
| 36 hours | Update README.md (#105) | Will DePue | |
| 36 hours | fp16 tied embedding + lr/warmdown tuning — val_bpb 1.2197 (#42) | Renier Velazco | |
| keep tok_emb.weight in fp16 during int8 export (kills the quant gap), shrink MLP hidden to 992 to fit under 16MB, bump warmdown to 3600 and matrix LR to 0.06. tested on 8xH100 SXM (2 seeds) and 8xH200 SXM (3 seeds). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> | |||
| 36 hours | Merge pull request #100 from sandsevenone/mlx_eager_eval | Will DePue | |
| Use eager mx.eval() to fix running train script on 16GB Mac devices | |||
| 37 hours | Add MLX_EAGER_EVAL flag to further reduce memory pressure by ↵ | sandrone | |
| force-evaluating the graph after each sub-batch step | |||
| 2 days | Update README.md | Will DePue | |
| 2 days | Update train_gpt_mlx.py | Will DePue | |
| 2 days | Update train_gpt.py | Will DePue | |
| 2 days | Merge pull request #35 from openai/0hq-patch-1 | Will DePue | |
| Update README.md | |||
| 2 days | Update README.md | Will DePue | |
| 2 days | Merge pull request #32 from yhn112/fix-mlx-eval-memory-growth | Will DePue | |
| Fix MLX multi-batch validation memory growth | |||
| 2 days | Merge pull request #9 from oof-baroomf/patch-1 | Will DePue | |
| Update README typo | |||
| 2 days | Merge pull request #18 from berniwal/main | Will DePue | |
| MLX Timing Mismatch with Main Script | |||
| 2 days | Log MLX validation progress | Michael Diskin | |
| 2 days | Fix MLX validation loss accumulation | Michael Diskin | |
| 2 days | match timing to main script to exclude eval timing | bernhardwalser | |
| 2 days | Update README typo | Dhruv Saini | |
| 2 days | Remove scripts | Will DePue | |
| 3 days | Update README.md | Will DePue | |
| 3 days | Launch snapshot | Will DePue | |
