<feed xmlns='http://www.w3.org/2005/Atom'>
<title>parameter-golf.git, branch main</title>
<subtitle>Unnamed repository; edit this file 'description' to name the repository.
</subtitle>
<link rel='alternate' type='text/html' href='https://git.blackhao.com/parameter-golf.git/'/>
<entry>
<title>Update README.md</title>
<updated>2026-03-19T23:30:22+00:00</updated>
<author>
<name>Alex Zhao</name>
<email>alexzhao@openai.com</email>
</author>
<published>2026-03-19T23:30:22+00:00</published>
<link rel='alternate' type='text/html' href='https://git.blackhao.com/parameter-golf.git/commit/?id=45bbccff356439d2f0b0dbae06cc3fa58b9576ed'/>
<id>45bbccff356439d2f0b0dbae06cc3fa58b9576ed</id>
<content type='text'>
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
</pre>
</div>
</content>
</entry>
<entry>
<title>Update README.md</title>
<updated>2026-03-19T22:32:27+00:00</updated>
<author>
<name>Will DePue</name>
<email>williamd@openai.com</email>
</author>
<published>2026-03-19T22:32:27+00:00</published>
<link rel='alternate' type='text/html' href='https://git.blackhao.com/parameter-golf.git/commit/?id=5e29bfd388b5416dee31c1d5079eebf4ee5c310d'/>
<id>5e29bfd388b5416dee31c1d5079eebf4ee5c310d</id>
<content type='text'>
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
</pre>
</div>
</content>
</entry>
<entry>
<title>commit ttt record (#77)</title>
<updated>2026-03-19T22:30:26+00:00</updated>
<author>
<name>Sam Acquaviva</name>
<email>samacqua@gmail.com</email>
</author>
<published>2026-03-19T22:30:26+00:00</published>
<link rel='alternate' type='text/html' href='https://git.blackhao.com/parameter-golf.git/commit/?id=bd2463aaa21fec47f643c593d86d4dd385d474e9'/>
<id>bd2463aaa21fec47f643c593d86d4dd385d474e9</id>
<content type='text'>
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
</pre>
</div>
</content>
</entry>
<entry>
<title>Record: 10L Mixed Precision: val_bpb=1.2147 (10 layers + int6 middle layers) (#39)</title>
<updated>2026-03-19T22:26:46+00:00</updated>
<author>
<name>Nan Liu</name>
<email>45443761+nanlliu@users.noreply.github.com</email>
</author>
<published>2026-03-19T22:26:46+00:00</published>
<link rel='alternate' type='text/html' href='https://git.blackhao.com/parameter-golf.git/commit/?id=9ac12c26d550481a1a486ce2b450b1ffed60b832'/>
<id>9ac12c26d550481a1a486ce2b450b1ffed60b832</id>
<content type='text'>
* Add Lower LR submission: val_bpb=1.2230 (MATRIX_LR=0.02)

Systematic LR sweep showed default Muon/Adam learning rates (0.04) were
too high. MATRIX_LR=0.02, SCALAR_LR=0.02, TIED_EMBED_LR=0.03 gives
consistent improvement. Same 9L/512d architecture, no other changes.

* Add 10L Mixed Precision submission: val_bpb=1.2147

10 transformer layers (vs baseline 9) with mixed int8/int6 compression:
- Full int8 for first/last 3 layers (precision-sensitive)
- Int6 (step=4 rounding) for middle layers 3-6 (compression-friendly)
- Lower LR: MATRIX_LR=0.02, SCALAR_LR=0.02, TIED_EMBED_LR=0.03
- Artifact: 15,928,974 bytes (under 16MB cap)
- Improvement: 0.0097 bpb / 0.0217 nats over baseline (1.2244)

Also adds PRUNE_RATIO and INT4_LAYERS/INT4_STEP support to train_gpt.py
for mixed-precision post-training quantization.

* Revert root train_gpt.py to upstream baseline

The root script should remain the baseline. Submission-specific
modifications (PRUNE_RATIO, INT4_LAYERS, INT4_STEP) only belong
in the records/ folder copy.</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
* Add Lower LR submission: val_bpb=1.2230 (MATRIX_LR=0.02)

Systematic LR sweep showed default Muon/Adam learning rates (0.04) were
too high. MATRIX_LR=0.02, SCALAR_LR=0.02, TIED_EMBED_LR=0.03 gives
consistent improvement. Same 9L/512d architecture, no other changes.

* Add 10L Mixed Precision submission: val_bpb=1.2147

10 transformer layers (vs baseline 9) with mixed int8/int6 compression:
- Full int8 for first/last 3 layers (precision-sensitive)
- Int6 (step=4 rounding) for middle layers 3-6 (compression-friendly)
- Lower LR: MATRIX_LR=0.02, SCALAR_LR=0.02, TIED_EMBED_LR=0.03
- Artifact: 15,928,974 bytes (under 16MB cap)
- Improvement: 0.0097 bpb / 0.0217 nats over baseline (1.2244)

Also adds PRUNE_RATIO and INT4_LAYERS/INT4_STEP support to train_gpt.py
for mixed-precision post-training quantization.

* Revert root train_gpt.py to upstream baseline

The root script should remain the baseline. Submission-specific
modifications (PRUNE_RATIO, INT4_LAYERS, INT4_STEP) only belong
in the records/ folder copy.</pre>
</div>
</content>
</entry>
<entry>
<title>Update README.md</title>
<updated>2026-03-19T22:26:36+00:00</updated>
<author>
<name>Will DePue</name>
<email>williamd@openai.com</email>
</author>
<published>2026-03-19T22:26:36+00:00</published>
<link rel='alternate' type='text/html' href='https://git.blackhao.com/parameter-golf.git/commit/?id=ae882089b58c74d37a02eda8358219f41cd5f4e1'/>
<id>ae882089b58c74d37a02eda8358219f41cd5f4e1</id>
<content type='text'>
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
</pre>
</div>
</content>
</entry>
<entry>
<title>Update README.md</title>
<updated>2026-03-19T22:03:04+00:00</updated>
<author>
<name>Will DePue</name>
<email>williamd@openai.com</email>
</author>
<published>2026-03-19T22:03:04+00:00</published>
<link rel='alternate' type='text/html' href='https://git.blackhao.com/parameter-golf.git/commit/?id=d2bd76079bfdf9c9847f1e88593e3beb9e4fa9da'/>
<id>d2bd76079bfdf9c9847f1e88593e3beb9e4fa9da</id>
<content type='text'>
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
</pre>
</div>
</content>
</entry>
<entry>
<title>Update README.md</title>
<updated>2026-03-19T21:55:57+00:00</updated>
<author>
<name>Will DePue</name>
<email>williamd@openai.com</email>
</author>
<published>2026-03-19T21:55:57+00:00</published>
<link rel='alternate' type='text/html' href='https://git.blackhao.com/parameter-golf.git/commit/?id=535352463e08b52a602d33ed8cf24f1379addee7'/>
<id>535352463e08b52a602d33ed8cf24f1379addee7</id>
<content type='text'>
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
</pre>
</div>
</content>
</entry>
<entry>
<title>Update README.md</title>
<updated>2026-03-19T21:33:47+00:00</updated>
<author>
<name>Will DePue</name>
<email>williamd@openai.com</email>
</author>
<published>2026-03-19T21:33:47+00:00</published>
<link rel='alternate' type='text/html' href='https://git.blackhao.com/parameter-golf.git/commit/?id=f3897c16bb913640c2b65d2e82addab307245034'/>
<id>f3897c16bb913640c2b65d2e82addab307245034</id>
<content type='text'>
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
</pre>
</div>
</content>
</entry>
<entry>
<title>Update README.md</title>
<updated>2026-03-19T21:31:28+00:00</updated>
<author>
<name>Will DePue</name>
<email>williamd@openai.com</email>
</author>
<published>2026-03-19T21:31:28+00:00</published>
<link rel='alternate' type='text/html' href='https://git.blackhao.com/parameter-golf.git/commit/?id=cfa5726b25f16f5330fd1d0b5343a5f28a5b6d11'/>
<id>cfa5726b25f16f5330fd1d0b5343a5f28a5b6d11</id>
<content type='text'>
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
</pre>
</div>
</content>
</entry>
<entry>
<title>Int6 + MLP 3x + sliding window: val_bpb=1.1574 (#61)</title>
<updated>2026-03-19T21:28:57+00:00</updated>
<author>
<name>Sam Larson</name>
<email>166414725+saml212@users.noreply.github.com</email>
</author>
<published>2026-03-19T21:28:57+00:00</published>
<link rel='alternate' type='text/html' href='https://git.blackhao.com/parameter-golf.git/commit/?id=555669e8330472143139c2f82bba15baab1a5e0d'/>
<id>555669e8330472143139c2f82bba15baab1a5e0d</id>
<content type='text'>
* Warmdown-quantization co-optimization, val_bpb=1.2154

Novel finding: aggressive LR decay (WARMDOWN_ITERS=20000) reduces int8 quantization
penalty from 0.014 to 0.005 BPB. Combined with FP16 tied embeddings and moderate
NTK-RoPE extrapolation (eval@1408).

Full warmdown sweep across 10 values and detailed analysis in README.

* breakthrough: 1.1574 BPB via int6 + MLP 3x + sliding window stride=256

---------

Co-authored-by: Sam Larson &lt;saml212@users.noreply.github.com&gt;</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
* Warmdown-quantization co-optimization, val_bpb=1.2154

Novel finding: aggressive LR decay (WARMDOWN_ITERS=20000) reduces int8 quantization
penalty from 0.014 to 0.005 BPB. Combined with FP16 tied embeddings and moderate
NTK-RoPE extrapolation (eval@1408).

Full warmdown sweep across 10 values and detailed analysis in README.

* breakthrough: 1.1574 BPB via int6 + MLP 3x + sliding window stride=256

---------

Co-authored-by: Sam Larson &lt;saml212@users.noreply.github.com&gt;</pre>
</div>
</content>
</entry>
</feed>
