parameter-golf.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author
2026-03-19	Update README.mdHEAD main	Alex Zhao

2026-03-19	Update README.md	Will DePue

2026-03-19	commit ttt record (#77)	Sam Acquaviva

2026-03-19	Record: 10L Mixed Precision: val_bpb=1.2147 (10 layers + int6 middle layers) ↵	Nan Liu
	(#39) * Add Lower LR submission: val_bpb=1.2230 (MATRIX_LR=0.02) Systematic LR sweep showed default Muon/Adam learning rates (0.04) were too high. MATRIX_LR=0.02, SCALAR_LR=0.02, TIED_EMBED_LR=0.03 gives consistent improvement. Same 9L/512d architecture, no other changes. * Add 10L Mixed Precision submission: val_bpb=1.2147 10 transformer layers (vs baseline 9) with mixed int8/int6 compression: - Full int8 for first/last 3 layers (precision-sensitive) - Int6 (step=4 rounding) for middle layers 3-6 (compression-friendly) - Lower LR: MATRIX_LR=0.02, SCALAR_LR=0.02, TIED_EMBED_LR=0.03 - Artifact: 15,928,974 bytes (under 16MB cap) - Improvement: 0.0097 bpb / 0.0217 nats over baseline (1.2244) Also adds PRUNE_RATIO and INT4_LAYERS/INT4_STEP support to train_gpt.py for mixed-precision post-training quantization. * Revert root train_gpt.py to upstream baseline The root script should remain the baseline. Submission-specific modifications (PRUNE_RATIO, INT4_LAYERS, INT4_STEP) only belong in the records/ folder copy.
2026-03-19	Update README.md	Will DePue

2026-03-19	Update README.md	Will DePue

2026-03-19	Update README.md	Will DePue

2026-03-19	Update README.md	Will DePue

2026-03-19	Update README.md	Will DePue

2026-03-19	Int6 + MLP 3x + sliding window: val_bpb=1.1574 (#61)	Sam Larson
	* Warmdown-quantization co-optimization, val_bpb=1.2154 Novel finding: aggressive LR decay (WARMDOWN_ITERS=20000) reduces int8 quantization penalty from 0.014 to 0.005 BPB. Combined with FP16 tied embeddings and moderate NTK-RoPE extrapolation (eval@1408). Full warmdown sweep across 10 values and detailed analysis in README. * breakthrough: 1.1574 BPB via int6 + MLP 3x + sliding window stride=256 --------- Co-authored-by: Sam Larson <saml212@users.noreply.github.com>
2026-03-19	Update README.md	Will DePue

2026-03-19	Record: Sliding Window + FP16 Embed + 10L + Muon WD + Overtone Init ↵	notapplica
	(val_bpb=1.1748) (#60) * Add NTK Eval + Overtone Init submission (1.2160 BPB) Train@1024 with overtone embedding init and phase-transition residual mixing, eval@2048 with NTK-aware dynamic RoPE scaling. Mean val_bpb 1.2160 across 3 seeds (p=0.0012 for 0.0194-nat improvement over baseline). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Update submission: Muon WD + NTK Eval + Overtone Init (1.2094 BPB, p=0.0002) * Update submission: 10-Layer + Muon WD + NTK Eval + Overtone Init (1.2029 BPB, p=0.0006) * Update submission: FP16 Embed + 10L + Muon WD + NTK + Overtone (1.2008 BPB) * Update submission: 1.2000 BPB — FP16 Embed + 10L + Muon WD + NTK@1408 + Overtone * Update: 1.1748 BPB — Sliding Window + FP16 Embed + 10L + Muon WD + Overtone --------- Co-authored-by: notapplica <notapplica@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-19	Update README.md	Will DePue

2026-03-19	Update README.md	Will DePue

2026-03-19	Update README.md	Will DePue

2026-03-19	New SOTA attempt (#52)	spokane-way
	Co-authored-by: spokane-way <spokane@way>
2026-03-19	Update README.md	Will DePue

2026-03-19	Fix: score final partial window in sliding window eval (#124)	Matthew Li
	The window_starts filter dropped windows shorter than stride, silently skipping up to (stride-1) tokens at the end of the validation set. Now includes all windows with >= 1 scoreable token, and clamps the score start for short final windows.
2026-03-19	Add record: Sliding Window Eval (stride=64), val_bpb=1.1925 (#50)	Matthew Li

2026-03-19	Update README.md	Will DePue

2026-03-19	SOTA attempt (val_bpb=1.2064) (#49)	spokane-way
	* SOTA attempt * Improve score on SXM --------- Co-authored-by: spokane-way <spokane@way>
2026-03-19	clarify torch version	Alex

2026-03-19	Update README.md (#105)	Will DePue

2026-03-19	fp16 tied embedding + lr/warmdown tuning — val_bpb 1.2197 (#42)	Renier Velazco
	keep tok_emb.weight in fp16 during int8 export (kills the quant gap), shrink MLP hidden to 992 to fit under 16MB, bump warmdown to 3600 and matrix LR to 0.06. tested on 8xH100 SXM (2 seeds) and 8xH200 SXM (3 seeds). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-03-19	Merge pull request #100 from sandsevenone/mlx_eager_eval	Will DePue
	Use eager mx.eval() to fix running train script on 16GB Mac devices
2026-03-20	Add MLX_EAGER_EVAL flag to further reduce memory pressure by ↵	sandrone
	force-evaluating the graph after each sub-batch step
2026-03-18	Update README.md	Will DePue

2026-03-18	Update train_gpt_mlx.py	Will DePue

2026-03-18	Update train_gpt.py	Will DePue

2026-03-18	Merge pull request #35 from openai/0hq-patch-1	Will DePue
	Update README.md
2026-03-18	Update README.md	Will DePue

2026-03-18	Merge pull request #32 from yhn112/fix-mlx-eval-memory-growth	Will DePue
	Fix MLX multi-batch validation memory growth
2026-03-18	Merge pull request #9 from oof-baroomf/patch-1	Will DePue
	Update README typo
2026-03-18	Merge pull request #18 from berniwal/main	Will DePue
	MLX Timing Mismatch with Main Script
2026-03-19	Log MLX validation progress	Michael Diskin

2026-03-19	Fix MLX validation loss accumulation	Michael Diskin

2026-03-18	match timing to main script to exclude eval timing	bernhardwalser

2026-03-18	Update README typo	Dhruv Saini

2026-03-18	Remove scripts	Will DePue

2026-03-18	Update README.md	Will DePue

2026-03-18	Launch snapshot	Will DePue