<feed xmlns='http://www.w3.org/2005/Atom'>
<title>parameter-golf.git/records, branch main</title>
<subtitle>Unnamed repository; edit this file 'description' to name the repository.
</subtitle>
<link rel='alternate' type='text/html' href='https://git.blackhao.com/parameter-golf.git/'/>
<entry>
<title>commit ttt record (#77)</title>
<updated>2026-03-19T22:30:26+00:00</updated>
<author>
<name>Sam Acquaviva</name>
<email>samacqua@gmail.com</email>
</author>
<published>2026-03-19T22:30:26+00:00</published>
<link rel='alternate' type='text/html' href='https://git.blackhao.com/parameter-golf.git/commit/?id=bd2463aaa21fec47f643c593d86d4dd385d474e9'/>
<id>bd2463aaa21fec47f643c593d86d4dd385d474e9</id>
<content type='text'>
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
</pre>
</div>
</content>
</entry>
<entry>
<title>Record: 10L Mixed Precision: val_bpb=1.2147 (10 layers + int6 middle layers) (#39)</title>
<updated>2026-03-19T22:26:46+00:00</updated>
<author>
<name>Nan Liu</name>
<email>45443761+nanlliu@users.noreply.github.com</email>
</author>
<published>2026-03-19T22:26:46+00:00</published>
<link rel='alternate' type='text/html' href='https://git.blackhao.com/parameter-golf.git/commit/?id=9ac12c26d550481a1a486ce2b450b1ffed60b832'/>
<id>9ac12c26d550481a1a486ce2b450b1ffed60b832</id>
<content type='text'>
* Add Lower LR submission: val_bpb=1.2230 (MATRIX_LR=0.02)

Systematic LR sweep showed default Muon/Adam learning rates (0.04) were
too high. MATRIX_LR=0.02, SCALAR_LR=0.02, TIED_EMBED_LR=0.03 gives
consistent improvement. Same 9L/512d architecture, no other changes.

* Add 10L Mixed Precision submission: val_bpb=1.2147

10 transformer layers (vs baseline 9) with mixed int8/int6 compression:
- Full int8 for first/last 3 layers (precision-sensitive)
- Int6 (step=4 rounding) for middle layers 3-6 (compression-friendly)
- Lower LR: MATRIX_LR=0.02, SCALAR_LR=0.02, TIED_EMBED_LR=0.03
- Artifact: 15,928,974 bytes (under 16MB cap)
- Improvement: 0.0097 bpb / 0.0217 nats over baseline (1.2244)

Also adds PRUNE_RATIO and INT4_LAYERS/INT4_STEP support to train_gpt.py
for mixed-precision post-training quantization.

* Revert root train_gpt.py to upstream baseline

The root script should remain the baseline. Submission-specific
modifications (PRUNE_RATIO, INT4_LAYERS, INT4_STEP) only belong
in the records/ folder copy.</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
* Add Lower LR submission: val_bpb=1.2230 (MATRIX_LR=0.02)

Systematic LR sweep showed default Muon/Adam learning rates (0.04) were
too high. MATRIX_LR=0.02, SCALAR_LR=0.02, TIED_EMBED_LR=0.03 gives
consistent improvement. Same 9L/512d architecture, no other changes.

* Add 10L Mixed Precision submission: val_bpb=1.2147

10 transformer layers (vs baseline 9) with mixed int8/int6 compression:
- Full int8 for first/last 3 layers (precision-sensitive)
- Int6 (step=4 rounding) for middle layers 3-6 (compression-friendly)
- Lower LR: MATRIX_LR=0.02, SCALAR_LR=0.02, TIED_EMBED_LR=0.03
- Artifact: 15,928,974 bytes (under 16MB cap)
- Improvement: 0.0097 bpb / 0.0217 nats over baseline (1.2244)

Also adds PRUNE_RATIO and INT4_LAYERS/INT4_STEP support to train_gpt.py
for mixed-precision post-training quantization.

* Revert root train_gpt.py to upstream baseline

The root script should remain the baseline. Submission-specific
modifications (PRUNE_RATIO, INT4_LAYERS, INT4_STEP) only belong
in the records/ folder copy.</pre>
</div>
</content>
</entry>
<entry>
<title>Int6 + MLP 3x + sliding window: val_bpb=1.1574 (#61)</title>
<updated>2026-03-19T21:28:57+00:00</updated>
<author>
<name>Sam Larson</name>
<email>166414725+saml212@users.noreply.github.com</email>
</author>
<published>2026-03-19T21:28:57+00:00</published>
<link rel='alternate' type='text/html' href='https://git.blackhao.com/parameter-golf.git/commit/?id=555669e8330472143139c2f82bba15baab1a5e0d'/>
<id>555669e8330472143139c2f82bba15baab1a5e0d</id>
<content type='text'>
* Warmdown-quantization co-optimization, val_bpb=1.2154

Novel finding: aggressive LR decay (WARMDOWN_ITERS=20000) reduces int8 quantization
penalty from 0.014 to 0.005 BPB. Combined with FP16 tied embeddings and moderate
NTK-RoPE extrapolation (eval@1408).

Full warmdown sweep across 10 values and detailed analysis in README.

* breakthrough: 1.1574 BPB via int6 + MLP 3x + sliding window stride=256

---------

Co-authored-by: Sam Larson &lt;saml212@users.noreply.github.com&gt;</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
* Warmdown-quantization co-optimization, val_bpb=1.2154

Novel finding: aggressive LR decay (WARMDOWN_ITERS=20000) reduces int8 quantization
penalty from 0.014 to 0.005 BPB. Combined with FP16 tied embeddings and moderate
NTK-RoPE extrapolation (eval@1408).

Full warmdown sweep across 10 values and detailed analysis in README.

* breakthrough: 1.1574 BPB via int6 + MLP 3x + sliding window stride=256

---------

Co-authored-by: Sam Larson &lt;saml212@users.noreply.github.com&gt;</pre>
</div>
</content>
</entry>
<entry>
<title>Record: Sliding Window + FP16 Embed + 10L + Muon WD + Overtone Init (val_bpb=1.1748) (#60)</title>
<updated>2026-03-19T21:13:10+00:00</updated>
<author>
<name>notapplica</name>
<email>yadunanll@gmail.com</email>
</author>
<published>2026-03-19T21:13:10+00:00</published>
<link rel='alternate' type='text/html' href='https://git.blackhao.com/parameter-golf.git/commit/?id=9fbdf8c949a909c8701857e379004fe9e11098c2'/>
<id>9fbdf8c949a909c8701857e379004fe9e11098c2</id>
<content type='text'>
* Add NTK Eval + Overtone Init submission (1.2160 BPB)

Train@1024 with overtone embedding init and phase-transition residual
mixing, eval@2048 with NTK-aware dynamic RoPE scaling. Mean val_bpb
1.2160 across 3 seeds (p=0.0012 for 0.0194-nat improvement over baseline).

Co-Authored-By: Claude Opus 4.6 (1M context) &lt;noreply@anthropic.com&gt;

* Update submission: Muon WD + NTK Eval + Overtone Init (1.2094 BPB, p=0.0002)

* Update submission: 10-Layer + Muon WD + NTK Eval + Overtone Init (1.2029 BPB, p=0.0006)

* Update submission: FP16 Embed + 10L + Muon WD + NTK + Overtone (1.2008 BPB)

* Update submission: 1.2000 BPB — FP16 Embed + 10L + Muon WD + NTK@1408 + Overtone

* Update: 1.1748 BPB — Sliding Window + FP16 Embed + 10L + Muon WD + Overtone

---------

Co-authored-by: notapplica &lt;notapplica@users.noreply.github.com&gt;
Co-authored-by: Claude Opus 4.6 (1M context) &lt;noreply@anthropic.com&gt;</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
* Add NTK Eval + Overtone Init submission (1.2160 BPB)

Train@1024 with overtone embedding init and phase-transition residual
mixing, eval@2048 with NTK-aware dynamic RoPE scaling. Mean val_bpb
1.2160 across 3 seeds (p=0.0012 for 0.0194-nat improvement over baseline).

Co-Authored-By: Claude Opus 4.6 (1M context) &lt;noreply@anthropic.com&gt;

* Update submission: Muon WD + NTK Eval + Overtone Init (1.2094 BPB, p=0.0002)

* Update submission: 10-Layer + Muon WD + NTK Eval + Overtone Init (1.2029 BPB, p=0.0006)

* Update submission: FP16 Embed + 10L + Muon WD + NTK + Overtone (1.2008 BPB)

* Update submission: 1.2000 BPB — FP16 Embed + 10L + Muon WD + NTK@1408 + Overtone

* Update: 1.1748 BPB — Sliding Window + FP16 Embed + 10L + Muon WD + Overtone

---------

Co-authored-by: notapplica &lt;notapplica@users.noreply.github.com&gt;
Co-authored-by: Claude Opus 4.6 (1M context) &lt;noreply@anthropic.com&gt;</pre>
</div>
</content>
</entry>
<entry>
<title>New SOTA attempt (#52)</title>
<updated>2026-03-19T21:04:16+00:00</updated>
<author>
<name>spokane-way</name>
<email>marthaludwigsdottir@gmail.com</email>
</author>
<published>2026-03-19T21:04:16+00:00</published>
<link rel='alternate' type='text/html' href='https://git.blackhao.com/parameter-golf.git/commit/?id=78c24e20145736fb48737e480bf446a600baf6a0'/>
<id>78c24e20145736fb48737e480bf446a600baf6a0</id>
<content type='text'>
Co-authored-by: spokane-way &lt;spokane@way&gt;</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Co-authored-by: spokane-way &lt;spokane@way&gt;</pre>
</div>
</content>
</entry>
<entry>
<title>Fix: score final partial window in sliding window eval (#124)</title>
<updated>2026-03-19T21:00:42+00:00</updated>
<author>
<name>Matthew Li</name>
<email>156706407+mattqlf@users.noreply.github.com</email>
</author>
<published>2026-03-19T21:00:42+00:00</published>
<link rel='alternate' type='text/html' href='https://git.blackhao.com/parameter-golf.git/commit/?id=3a6fec7941f3cc4187ac208496b6842f3563e18c'/>
<id>3a6fec7941f3cc4187ac208496b6842f3563e18c</id>
<content type='text'>
The window_starts filter dropped windows shorter than stride,
silently skipping up to (stride-1) tokens at the end of the
validation set. Now includes all windows with &gt;= 1 scoreable
token, and clamps the score start for short final windows.</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The window_starts filter dropped windows shorter than stride,
silently skipping up to (stride-1) tokens at the end of the
validation set. Now includes all windows with &gt;= 1 scoreable
token, and clamps the score start for short final windows.</pre>
</div>
</content>
</entry>
<entry>
<title>Add record: Sliding Window Eval (stride=64), val_bpb=1.1925 (#50)</title>
<updated>2026-03-19T17:28:12+00:00</updated>
<author>
<name>Matthew Li</name>
<email>156706407+mattqlf@users.noreply.github.com</email>
</author>
<published>2026-03-19T17:28:12+00:00</published>
<link rel='alternate' type='text/html' href='https://git.blackhao.com/parameter-golf.git/commit/?id=d84a3e819100504d96879e1e36d022efa5cbb81b'/>
<id>d84a3e819100504d96879e1e36d022efa5cbb81b</id>
<content type='text'>
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
</pre>
</div>
</content>
</entry>
<entry>
<title>SOTA attempt (val_bpb=1.2064) (#49)</title>
<updated>2026-03-19T17:25:29+00:00</updated>
<author>
<name>spokane-way</name>
<email>marthaludwigsdottir@gmail.com</email>
</author>
<published>2026-03-19T17:25:29+00:00</published>
<link rel='alternate' type='text/html' href='https://git.blackhao.com/parameter-golf.git/commit/?id=e89fcf8acf8e9fd3bf63e9809c160a4e510be61b'/>
<id>e89fcf8acf8e9fd3bf63e9809c160a4e510be61b</id>
<content type='text'>
* SOTA attempt

* Improve score on SXM

---------

Co-authored-by: spokane-way &lt;spokane@way&gt;</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
* SOTA attempt

* Improve score on SXM

---------

Co-authored-by: spokane-way &lt;spokane@way&gt;</pre>
</div>
</content>
</entry>
<entry>
<title>fp16 tied embedding + lr/warmdown tuning — val_bpb 1.2197 (#42)</title>
<updated>2026-03-19T17:16:50+00:00</updated>
<author>
<name>Renier Velazco</name>
<email>renier.velazco94@gmail.com</email>
</author>
<published>2026-03-19T17:16:50+00:00</published>
<link rel='alternate' type='text/html' href='https://git.blackhao.com/parameter-golf.git/commit/?id=a5eb9edbfb7391a5e323cd062222ad9bfe846974'/>
<id>a5eb9edbfb7391a5e323cd062222ad9bfe846974</id>
<content type='text'>
keep tok_emb.weight in fp16 during int8 export (kills the quant gap),
shrink MLP hidden to 992 to fit under 16MB, bump warmdown to 3600
and matrix LR to 0.06.

tested on 8xH100 SXM (2 seeds) and 8xH200 SXM (3 seeds).

Co-authored-by: Copilot &lt;223556219+Copilot@users.noreply.github.com&gt;</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
keep tok_emb.weight in fp16 during int8 export (kills the quant gap),
shrink MLP hidden to 992 to fit under 16MB, bump warmdown to 3600
and matrix LR to 0.06.

tested on 8xH100 SXM (2 seeds) and 8xH200 SXM (3 seeds).

Co-authored-by: Copilot &lt;223556219+Copilot@users.noreply.github.com&gt;</pre>
</div>
</content>
</entry>
<entry>
<title>Launch snapshot</title>
<updated>2026-03-18T16:32:01+00:00</updated>
<author>
<name>Will DePue</name>
<email>williamd@openai.com</email>
</author>
<published>2026-03-18T16:32:01+00:00</published>
<link rel='alternate' type='text/html' href='https://git.blackhao.com/parameter-golf.git/commit/?id=a15093adad328a650d421e53c078cbd2c45beb0e'/>
<id>a15093adad328a650d421e53c078cbd2c45beb0e</id>
<content type='text'>
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
</pre>
</div>
</content>
</entry>
</feed>
