parameter-golf.git - Unnamed repository; edit this file 'description' to name the repository.

diff options

author	Nan Liu <45443761+nanlliu@users.noreply.github.com>	2026-03-19 15:26:46 -0700
committer	GitHub <noreply@github.com>	2026-03-19 15:26:46 -0700
commit	9ac12c26d550481a1a486ce2b450b1ffed60b832 (patch)
tree	fa30460ec2e96320f9f8761c31df31f798490f94 /.gitignore
parent	ae882089b58c74d37a02eda8358219f41cd5f4e1 (diff)

Record: 10L Mixed Precision: val_bpb=1.2147 (10 layers + int6 middle layers) (#39)

* Add Lower LR submission: val_bpb=1.2230 (MATRIX_LR=0.02) Systematic LR sweep showed default Muon/Adam learning rates (0.04) were too high. MATRIX_LR=0.02, SCALAR_LR=0.02, TIED_EMBED_LR=0.03 gives consistent improvement. Same 9L/512d architecture, no other changes. * Add 10L Mixed Precision submission: val_bpb=1.2147 10 transformer layers (vs baseline 9) with mixed int8/int6 compression: - Full int8 for first/last 3 layers (precision-sensitive) - Int6 (step=4 rounding) for middle layers 3-6 (compression-friendly) - Lower LR: MATRIX_LR=0.02, SCALAR_LR=0.02, TIED_EMBED_LR=0.03 - Artifact: 15,928,974 bytes (under 16MB cap) - Improvement: 0.0097 bpb / 0.0217 nats over baseline (1.2244) Also adds PRUNE_RATIO and INT4_LAYERS/INT4_STEP support to train_gpt.py for mixed-precision post-training quantization. * Revert root train_gpt.py to upstream baseline The root script should remain the baseline. Submission-specific modifications (PRUNE_RATIO, INT4_LAYERS, INT4_STEP) only belong in the records/ folder copy.

Diffstat (limited to '.gitignore')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: