summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
32 hoursRecord: Sliding Window + FP16 Embed + 10L + Muon WD + Overtone Init ↵notapplica
(val_bpb=1.1748) (#60) * Add NTK Eval + Overtone Init submission (1.2160 BPB) Train@1024 with overtone embedding init and phase-transition residual mixing, eval@2048 with NTK-aware dynamic RoPE scaling. Mean val_bpb 1.2160 across 3 seeds (p=0.0012 for 0.0194-nat improvement over baseline). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Update submission: Muon WD + NTK Eval + Overtone Init (1.2094 BPB, p=0.0002) * Update submission: 10-Layer + Muon WD + NTK Eval + Overtone Init (1.2029 BPB, p=0.0006) * Update submission: FP16 Embed + 10L + Muon WD + NTK + Overtone (1.2008 BPB) * Update submission: 1.2000 BPB — FP16 Embed + 10L + Muon WD + NTK@1408 + Overtone * Update: 1.1748 BPB — Sliding Window + FP16 Embed + 10L + Muon WD + Overtone --------- Co-authored-by: notapplica <notapplica@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
32 hoursUpdate README.mdWill DePue
32 hoursUpdate README.mdWill DePue
33 hoursUpdate README.mdWill DePue
33 hoursNew SOTA attempt (#52)spokane-way
Co-authored-by: spokane-way <spokane@way>
33 hoursUpdate README.mdWill DePue
33 hoursFix: score final partial window in sliding window eval (#124)Matthew Li
The window_starts filter dropped windows shorter than stride, silently skipping up to (stride-1) tokens at the end of the validation set. Now includes all windows with >= 1 scoreable token, and clamps the score start for short final windows.
36 hoursAdd record: Sliding Window Eval (stride=64), val_bpb=1.1925 (#50)Matthew Li
36 hoursUpdate README.mdWill DePue
36 hoursSOTA attempt (val_bpb=1.2064) (#49)spokane-way
* SOTA attempt * Improve score on SXM --------- Co-authored-by: spokane-way <spokane@way>
36 hoursclarify torch versionAlex
36 hoursUpdate README.md (#105)Will DePue
36 hoursfp16 tied embedding + lr/warmdown tuning — val_bpb 1.2197 (#42)Renier Velazco
keep tok_emb.weight in fp16 during int8 export (kills the quant gap), shrink MLP hidden to 992 to fit under 16MB, bump warmdown to 3600 and matrix LR to 0.06. tested on 8xH100 SXM (2 seeds) and 8xH200 SXM (3 seeds). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
36 hoursMerge pull request #100 from sandsevenone/mlx_eager_evalWill DePue
Use eager mx.eval() to fix running train script on 16GB Mac devices
37 hoursAdd MLX_EAGER_EVAL flag to further reduce memory pressure by ↵sandrone
force-evaluating the graph after each sub-batch step
2 daysUpdate README.mdWill DePue
2 daysUpdate train_gpt_mlx.pyWill DePue
2 daysUpdate train_gpt.pyWill DePue
2 daysMerge pull request #35 from openai/0hq-patch-1Will DePue
Update README.md
2 daysUpdate README.mdWill DePue
2 daysMerge pull request #32 from yhn112/fix-mlx-eval-memory-growthWill DePue
Fix MLX multi-batch validation memory growth
2 daysMerge pull request #9 from oof-baroomf/patch-1Will DePue
Update README typo
2 daysMerge pull request #18 from berniwal/mainWill DePue
MLX Timing Mismatch with Main Script
2 daysLog MLX validation progressMichael Diskin
2 daysFix MLX validation loss accumulationMichael Diskin
2 daysmatch timing to main script to exclude eval timingbernhardwalser
2 daysUpdate README typoDhruv Saini
2 daysRemove scriptsWill DePue
3 daysUpdate README.mdWill DePue
3 daysLaunch snapshotWill DePue